The present application relates to a prognostic method in the field of infectious disease and critical care, and in particular, to a method of assessing the risk of death in patients with sepsis.
Sepsis (or “blood poisoning”) is a life-threatening condition characterized by systemic inflammation and blood clotting in response to microbial infection. The Global Sepsis Alliance declared that sepsis is a global emergency with about 6 to 8 million lives lost annually. Patients who survive sepsis often endure long-term cognitive and functional declines.
Current management strategies for sepsis are largely supportive and include early administration of broad-spectrum antibiotics, fluid resuscitation, source control and mechanical ventilation. Despite these strategies, the ICU mortality rate from sepsis remains high (15% to 30%) and risk assessment remains a challenge. Sepsis diagnosis is also a challenge since the clinical features of sepsis closely resemble those of non-infectious systemic inflammatory response syndrome (SIRS). Thus, early recognition of sepsis would improve outcomes.
The identification of highly reliable outcome predictors in sepsis is important to stratify or enroll patients in clinical trials of new anti-sepsis therapies, to monitor a patient's response to treatment, to enhance confidence in end-of-life decision making, and to improve health care resource utilization. Various clinical scoring systems have been developed such as the Acute Physiology and Chronic Health Evaluation (APACHE) II, III, and IV scores, the Multiple Organ Dysfunction Score (MODS) score and the Sequential Organ Failure Assessment (SOFA) score. However, these scores have only a moderate discriminative power with respect to ICU/hospital mortality. Using Receiver Operating Characteristic (ROC) curves, the predictive powers of single measures of these clinical scores, were found to be modest with areas under the curve (AUCs) ranging from 0.6 to 0.7.
It would thus be desirable to develop a risk assessment tool with improved predictive capabilities with respect to risk of death in sepsis patients.
It has now been found that the risk of mortality in a septic patient changes over time, and such changes are based on a set of six time-varying biological indicators (TVBIs). These TVBIs include cell-free DNA (cfDNA), protein C, lactate, platelet count, creatinine, and Glasgow Coma Score (GCS), which may collectively be used as indicators to identify septic patients at risk of death.
Thus, in one aspect of the present invention, a method of assessing mortality risk in septic patients, for example patients admitted into the ICU, is provided comprising determining in a biological sample obtained from a patient the level of each of cfDNA, protein C, lactate, platelet count, creatinine, and GCS, and comparing the level of each to a baseline, control or normal level, and providing an assessment of mortality risk, wherein an elevated level of any one of cfDNA, lactate and creatinine or a lowered level of any one of protein C, platelets and GCS is indicative of increased risk of death in the patient.
In another aspect of the invention, a method for determining the probability of dying on a specific day or within a certain time frame (such as within 28 days) is provided comprising the computation from the observed values of the 6 indicators (cfDNA, protein C, lactate, platelet counts, creatinine, and GCS) of a patient in question and the estimated coefficients of the explanatory variables in the CLOGLOG model.
In another aspect of the present invention, a method for monitoring a patient's response to treatment is provided. The method comprises determining in a biological sample obtained from a patient the baseline level of each of cfDNA, protein C, lactate, platelet count, creatinine, and GCS at the onset of treatment, and one or more treatment levels at one or more time points following onset of treatment, comparing the treatment level of each indicator to the baseline level, and providing an assessment of mortality risk, wherein a reduced level of any of cfDNA, lactate and creatinine or an increased level of any of protein C, platelets and GCS indicates that the patient is responding to treatment.
In another aspect of the invention, personalized mortality risk profiles for a patient may be generated based on changing values of the present time-varying biological indicators. The method comprises determining the levels of each of the indicators over time when the patient is septic, and determining the changes in the level of one or more of the indicators that is associated with a decline in the state of the patient and providing a risk profile for the patient which indicates the level of change of the one or more indicators that is indicative of risk of death in the patient.
In another aspect of the invention, a method of detailed ROC analysis is provided for finding the threshold probabilities that can achieve the objectives of (1) maintaining a chosen level of sensitivity, specificity, positive predictive value (PPV), or negative predictive value (NPV), (2) maximizing a weighted sum of these desirable but conflicting measures, and (3) getting the best balance between sensitivity and specificity or between PPV and NPV.
Other features and advantages of the present application will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating embodiments of the application, are given by way of illustration only and the scope of the claims should not be limited by these embodiments, but should be given the broadest interpretation consistent with the description as a whole.
These and other aspects of the invention are described in the detailed description that follows by reference to the following figures.
A method of assessing mortality risk in a septic mammal, for example a patient admitted into the ICU, is provided comprising determining in a biological sample obtained from a patient the level of each of the time-varying biological indicators, cfDNA, protein C, lactate, platelet counts, creatinine, and GCS, and comparing the level of each to a normal or control level, or to a previously determined level, wherein an increase in the level of any of cfDNA, lactate and creatinine as compared to the normal or previously determined level, or a decrease in the level of any of protein C, platelets and GCS as compared to the normal or previously determined level, is indicative of increased risk of death in the mammal.
The term “mammal” includes human and non-human mammals such as a domestic animal (e.g. dog, cat, cow, horse, pig, goat and the like) or a non-domestic animal. A mammal is considered to have sepsis, or to be septic, when body temperature is abnormally higher or lower than normal, heart rate is high, respiratory rate is high and the mammal has a confirmed or probable infection (by an infectious agent such as a virus, bacteria, fungi such as ringworm, nematodes such as parasitic roundworms and pinworms, arthropods such as ticks, mites, fleas, and lice, and other macroparasites such as tapeworms and other helminths). A human is considered to be septic when, along with at least one dysfunctional organ system and confirmed or suspected infection, the patient has at least three of: i) core body temperature is above 100.4° F. (38.3° C.) or below 96.8° F. (36° C.), ii) heart rate is ≥90 beats a minute, iii) respiratory rate is ≥20 breaths a minute or a PaCO2 (partial pressure of carbon dioxide in arterial blood) is ≤32 mm Hg or the patient requires mechanical ventilation for an acute respiratory process; and iv) a white-cell count of ≥12,000/mm3 or ≤4,000/mm3 or a differential count showing >10 percent immature-eutrophils.
The term “cfDNA”, or “circulating free DNA”, refers to DNA fragments released to the blood plasma, which are generally released by activated neutrophils to aid in killing pathogens. However, the release of excessive amounts of cfDNA can also exert collateral damage to the host by activating blood clotting and inhibiting clot breakdown. The normal level of circulating cfDNA is about 2.2±0.6 μg/ml.
The term “protein C” (also known as vitamin K-dependent protein C preproprotein), is a natural anticoagulant that prevents the accumulation of blood clots in the small vessels of organs. As used herein, protein C encompasses full-length mammalian protein C, including functionally equivalent variants and isoforms thereof, such as human and non-human protein C. Transcript sequences of various forms of full-length protein C are known and readily accessible on sequence databases, such as NCBI, by reference to nucleotide accession nos., e.g. human protein C (NM 000312), mouse protein C (NM_001042767) and canine protein C (NM_001013849.1). Protein C amino acid sequences are also known such as human (NP_000303), mouse (NP_001036232) and canine (NP_001013871.1). Normal levels of protein C are about 61% to 133% of the protein C levels in plasma pooled from healthy volunteers (which is set at 100%). Increased consumption of protein C, i.e. a decrease in the level of protein C from a baseline level, is indicative of sepsis.
The term “lactate” refers to the conjugate base of lactic acid which plays a role in several biochemical processes. The normal level of circulating lactate is about 0.5-1.0 mmol/L. Higher levels of lactate (above a normal or baseline level) are indicative of poor oxygen delivery to organs, presumably due to macro- and/or microcirculatoiy dysfunction, and may reflect tissue hypoperfusion or cellular hypoxia.
The term “platelet”, also called thrombocytes, are a component of blood that function to prevent bleeding from blood vessel injury by initiating blood clotting. Normally, the number of circulating platelets or platelet count is in the range of about 150,000 to 450,000 platelets per microliter of circulating blood. Lower circulating levels of platelets (i.e. below a normal or baseline level) is indicative of sepsis.
Creatinine is a breakdown product of creatine phosphate in muscle, and is usually produced at a fairly constant rate by the body. Normal levels of creatinine is about ≤100 μmol/L in the blood. Elevated levels are indicative of kidney failure.
The term “GCS” as used herein refers to the Glasgow Coma Score as previously described by Marshall J C et al. (Crit Care Med 1995; 23: 1638-52), based on the Glasgow Coma Scale originally described by Teasdale G and Sennett B (Lancet 1974; 2: 81-4). The Glasgow Coma Scale provides a practical method for assessment of impairment of conscious level in response to defined stimuli. It is a neurological scale that provides a reliable and objective way of recording the conscious state of a person for initial as well as subsequent assessment by monitoring eye response, verbal response and motor response. A patient is assessed against the criteria of the scale, and the resulting points give a patient score between 3 (indicating deep unconsciousness) and either 14 or 15 (indicating normal, either based on original or more widely used modified or revised scale, respectively). Thus, the greater the score, the more improved the patient. Neurological dysfunction, i.e. a reduced GCS score, is indicative of sepsis.
The levels of each of the time-varying biological indicators is determined in order to assess mortality risk in a mammal. A suitable biological sample is obtained to measure one or more of the time-varying biological indicators. For example, cell-free DNA, protein C, lactate and creatinine may be measured in biological fluids such as plasma, serum, whole blood, urine, saliva, sweat, tears, and cerebrospinal fluid (CSF).
Cell-free DNA may be measured using DNA extraction techniques (e.g. including removal of cellular debris by centrifugation, and removal of components such as lipids using detergents/surfactants, and removal of protein and RNA using suitable enzymes (proteases and RNase); detection by UV spectrometry (absorbance at 280 nm), DNA dye staining (e.g. with Picogreen, SYBR-green), microfluidics technology (e.g. Picogreen fluorescent dye-labelling), followed by quantitation using an electric current, polymerase chain reaction (PCR) with sequence-specific primers, restriction enzymes and ethidium bromide staining, Slot blot or Southern blotting technology.
Protein C may also be measured in plasma, serum, or whole blood by immunoassay, such as indirect immunoassay, sandwich immunoassay and competitive binding assay, or microfluidics technology using a monoclonal or polyclonal antibody against human protein C. As one of skill in the art would know, antibodies specific for protein C are commercially available (e.g. from Thermofisher, Abeam, Novus Biologicals). Alternatively, antibodies for this purpose may be raised by injecting a non-human host animal, e.g. a mouse or rabbit, with antigen (protein C or immunogenic fragment thereof), and then isolating antibody from a biological sample taken from the host animal.
A preferred immunoassay for use to determine expression levels of target protein in a sample is an ELISA (Enzyme Linked ImmunoSorbent Assay) or Enzyme ImmunoAssay (EIA). To determine the level or concentration of the target protein using ELISA, the target protein to be analyzed is generally immobilized, for example, on a solid adherent support, such as a mierotiter plate, polystyrene beads, nitrocellulose, cellulose acetate, glass fibers and other suitable porous polymers, which is pretreated with an appropriate ligand for the target, which is then complexed with a specific reactant or ligand such as an antibody which is itself linked (either before or following formation of the complex) to an indicator, such as an enzyme. Detection may then be accomplished by incubating this enzyme-complex with a substrate for the enzyme that yields a detectable product. The indicator may be linked directly to the reactant (e.g. antibody) or may be linked via another entity, such as a secondary antibody that recognizes the first or primary antibody. Alternatively, the linker may be a protein such as streptavidin if the primary antibody is biotin-labeled. Examples of suitable enzymes for use as an indicator include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), β-galactosidase, acetylcholinesterase and catalase. A large selection of substrates is available for performing the ELISA with these indicator enzymes. As one of skill in the art will appreciate, the substrate will vary with the enzyme utilized. Useful substrates also depend on the level of detection required and the detection instrumentation used, e.g. spectrophotometer, fluorometer or luminometer. Substrates for HRP include 3,3′,5,5′-Tetramethylbenzidine (TMB), 3,3′-Diaminobenzidine (DAB) and 2,2′-azino-bis(3-ethylbenzothiazoline-6-sulphonic acid) (ABTS). Substrates for AP include para-Nitrophenylphosphates. Substrates for β-galactosidase include β-galactosides; the substrate for acetylcholinesterase is acetylcholine, and the substrate for catalase is hydrogen peroxide.
Isoelectric focusing may also be used to measure protein C whereby protein C is separated and quantified according to its isoelectric point within a continuous pH gradient. Protein C can also be quantified using a chromogenic assay in which protein C in the plasma is activated (e.g. by addition of an activator such as the snake venom, Protac) and the level of activated protein C (APC) may be measured by determining change in optical density in the presence of a chromogenic substrate specific to APC (such as S-2366) which results in a colour change and comparing to a standard APC curve. A functional clotting-based assay such as the Activated Partial Thromboplastin Time (APTT) assay may also be used to measure Protein C. Briefly, plasma is incubated at 37° C. with phospholipids, a contact activator (e.g. kaolin), and a protein C activator (e.g. the snake venom, Protac). After a few minutes of incubation, CaCl2 is added to initiate clotting. The time required to clot is recorded, and the protein C concentration is determined from a reference curve of plasma containing different concentrations of protein C.
Lactate and creatinine may be measured in a plasma, serum, or whole blood sample using an enzymatic assay to generate a product that may be detected colorimetrically or fluorometrically by reaction with a selective probe. To measure lactate in a sample, lactate dehydrogenase or lactate oxidase assays may be used. To measure creatinine in a sample, a creatininase assay in which creatine from creatinine is converted to sarcosine which is oxidized with sarcosine oxidase to produce a product which reacts with a probe for colorimetric or fluorescent quantitation. Lactate can be also measured using electrode methods, such as blood gas analyzers. Lactate can be measured in CSF (cerebral spinal fluid) and other body fluids while creatinine can also be measured in urine. Colorimetric assays may also be used to measure creatinine, for example, using the Jaffe method in which creatinine is reacted with picric acid to yield a detectable product.
Platelet count is measured in a blood sample obtained from a patient, either from a vein, or finger or heel smear. A hematology analyzer (cell counter) or POC devices for complete blood count (CBC) testing may be used.
The GCS comprises three tests performed at bedside: eye response (rated 1 to 4), verbal response (rated 1 to 5), and motor response (rated 1 to 6). Both the three individual element values as well as their sum are considered important. The following provides a summary of the grading for eye response: does not open eyes (1), opens eyes in response to pressure (2), opens eyes in response to voice (3), and opens eyes spontaneously (4); verbal response: makes no sounds (1); makes sounds (2), words (3), confused disoriented (4), oriented, converses normally (5); and motor response: makes no movements (1), extension to painful stimuli (decerebrate response) (2), abnormal flexion to painful stimuli (decorticate response) (3), flexion/withdrawal to painful stimuli (4), localizes to painful stimuli (5) and obeys commands (6).
Once the levels of each time-varying biological indicator is determined, they may be used to assess mortality risk in a septic mammal. The determined level of each indicator is compared to a normal or control level of the indicator, i.e. a level of the indicator in corresponding healthy individuals. An increase of at least about 1.5-fold in the level of any of cfDNA, lactate and creatinine, or a decrease in the level of protein C to ≤65% of normal levels, of platelets to ≤200×109/L or a 10% decrease/day, or a decrease of GCS to ≤12, is indicative of increased risk of death in the mammal, for example, within 28 days.
The levels of the indicators may alternatively be compared to a previously determined level in a septic mammal being assessed for risk of death. This will provide the change in the level of a given indicator in the septic mammal. An increase of at least about 1,5-fold in the level of any of cfDNA, lactate and creatinine, or a decrease in the level of protein C and platelets by about 1.5-fold from normal, or a decrease in GCS, is indicative of increased risk of death in a mammal.
In an assessment of a mammal, it will be appreciated that the determined level of each indicator may be compared to a combination of normal/control indicator levels and previously determined indicator levels. For example, in one embodiment, levels of cfDNA, protein C, platelets and GCS may be compared to a control or normal level (e.g. the initial level of these indicators is utilized for the assessment), while determined levels of lactate and creatinine are compared to a previously determined level of these indicators in the septic mammal (i.e. the change in the level of these indicators is utilized in the assessment).
It will also be appreciated that two or more indicator levels may be evaluated to determine risk of death in a septic patient. The indicators utilized for the assessment may vary from patient to patient within the group of cfDNA, protein C, platelets, lactate, creatinine and GCS.
The indicator levels may also be determined at a plurality of time-points to obtain time-varying indicator values in the risk assessment.
In another aspect of the invention, a method for determining the probability of dying on a specific day or within a certain time frame (such as within 28 days) is provided comprising determining in a biological sample the level of the 6 biological indicators (cfDNA, protein C, lactate, platelet counts, creatinine, and GCS) in comparison to control or previously determined levels of the indicators. The probability of dying is then determined based on a complementary log-log analysis of the levels of one or more of the time-varying indicators as described in the examples.
In another aspect of the invention, personalized mortality risk profiles for a patient may be generated based on changing values of the present time-varying biological indicators. The method comprises determining the levels of each of the indicators over time when the patient is septic, and determining the changes in the level of one or more of the indicators as compared to control levels (or benchmark levels) that is associated with a decline in the state of the patient and providing a risk profile for the patient which indicates the level of change of the one or more indicators that is indicative of risk of death in the patient. A longitudinal logit (L-Logit) model or complementary log-log analysis of the change in indicator levels is conducted as described in the examples. An increased risk of mortality is determined when the profile indicates an increase in the level of any one of cfDNA, lactate and creatinine, or decrease in the level of any one of protein C, platelets and GCS, or an increased probability of death based on the complementary log-log analysis. This method is useful to provide insights into patient-specific pathophysiology, to develop a treatment protocol for a patient, and for prognostic and predictive enrichment.
In another aspect of the present invention, a method for monitoring a patient's response to treatment is provided. The method comprises determining in a biological sample obtained from a patient the level of each of cfDNA, protein C, lactate, platelet count, creatinine, and GCS a baseline level of the indicators at the onset of treatment and one or more treatment levels at one or more time points following onset of treatment, comparing the level of each indicator to the baseline level, wherein a reduced level of any cfDNA, lactate and creatinine or an increased level of any of protein C, platelets and GCS, i.e. a return of one or more of the indicators to normal levels, indicates that the patient is responding to treatment. Such a method is useful to confirm suitability of a selected treatment, and further, to stratify/enroll patients for clinical trials of new anti-sepsis therapies.
The foregoing methods are beneficial to ascertain the appropriate type and level of care for a septic patent. In particular, the methods are useful to determine appropriate treatment for a given patient, i.e. one or a combination of reducing cfDNAs, lactate and/or creatinine levels, and/or increasing protein C, platelet levels, and/or GCS. For example, recombinant ART-123 is a molecule that has been determined to boost protein C levels. Where cfDNA levels are determined to be increased, treatments to inhibit blood clotting may be used, for example, anticoagulants such as heparin, warfarin (Coumadin), Rivaroxaban, Dabigatran, Apixaban, or antiplatelet drugs such as aspirin.
Other treatments for a septic patient may be administered to boost the immune system. These may include treatment with mesenchymal stem cells, herbal remedies such as Echinacea and ginseng, probiotics, and diet enhanced with immune boosting nutrients, vitamins and minerals (e.g. fruits and vegetables, fish (omega-3), shellfish (selenium), zinc-containing foods (beef), garlic (allicin), etc.
The present methods are also beneficial when considering end-of-life decision making, to enhance confidence in such decisions, and to improve health care resource utilization.
Terms of degree such as “about” or “approximately” as used herein refer to a reasonable amount of deviation from a stated quantity which does not significantly change the end result such as +/−5-10%.
Embodiments of the present invention are described in the following specific examples which are not to be construed as limiting.
To link the mortality risks of patients to the TVBIs, a novel approach for longitudinal analysis, termed the complementary log-log (CLOGLOG) model, was chosen. It has also been found that personalized mortality risk profiles can be generated which highlight the relative contribution of each TVBI for mortality risk.
The TVBIs that are missing in the APACHE II/III/IV, MODS, and SOFA scores but have been shown to have prognostic utility in septic patients include plasma concentrations of lactate, cfDNA, and protein C.
A multi-centre study of 392 septic patients was performed to determine the applicability of this assessment tool for determining the hazards of dying during the patients' stay in ICU/hospital up to 28 days. The assessment tool was also tested on 328 non-septic ICU patients to determine whether the pattern of the effects of the indicators is unique to septic patients. Blood samples were collected at baseline (within 24 hours of meeting the inclusion criteria for sepsis), then daily for the first week, followed by once a week for the duration of the patients' stay in the ICU, cfDNA, protein C, lactate, platelets, and creatinine levels may be determined from patient blood samples, whereas the GCS is a neurological scale that measures eye, verbal, and motor responses at the bedside.
A complementary log-log (CLOGLOG) model that followed the daily life of each patient until death in ICU/hospital, discharge, or 28 days since admission to predict the mortality risk over time and generate personalized mortality risk profiles that highlight the relative contribution of each TVBI for mortality risk was used. Each TVBI was represented by three analytical variables: day 1 variable, current variable, and change variable. The first two variables were alternatives for quantifying the level effect, whereas the third variable was for quantifying the change effect. The model using the combination of day 1 and change variables of each indicator is called the “day 1 specification”, whereas the model using the combination of current and change variables is called the “current specification”. The two specifications are complementary in yielding important biological insights. The combination of day 1 and change variables achieved a predictive power (AUC=0.90 (95% CI, 0.86-0.94)) that is similar to the combination of current and change variables. In both specifications, the assessment was done in the context of two preconditions (chronic lung disease and previous brain injury), age, and duration of stay.
The day 1 variables of a subset of the 6 indicators, namely protein C, lactate, and creatinine, was also used to distinguish septic patients from non-septic patients via a binomial logit model, resulting in AUC=0.67 (CI: 0.63 to 0.71). In general, patients with lower protein C, lower lactate, and higher creatinine were more likely to be septic patients.
392 patients with sepsis or septic shock and 361 non-septic patients were recruited from nine tertiary hospital ICUs across Canada between November 2010 and January 2013 (the DYNAMICS Study, ClinicalTrials.gov Identifier: NCT01355042). The study was approved by the Research Ethics Boards of all participating centers. Written informed consent was obtained from the patient or substitute decision-maker prior to enrolment into the study. When a priori consent was not feasible, a deferred consent approach was used. All septic events were adjudicated by at least 2 experienced ICU physicians. Adjudications were also performed in the non-septic patients to identify those patients who became septic during the course of their stay in the ICU. In total, 33 out of 361 non-septic patients developed sepsis in the ICU. These 33 patients were removed from the analysis of the non-septic patients, so the number of non-septic patients was reduced to 328.
The inclusion criteria for sepsis were a modification of those defined by Bernard et al. (N Engl J Med 2001; 344: 699-709). Patients were eligible for inclusion into the septic group of this study if they had a confirmed or suspected infection on the basis of clinical data at the time of screening, at least one dysfunctional organ system, 3 or more signs of systemic inflammatory response syndrome (SIRS), and were expected to remain in the ICU for ≥72 hours. The presence of organ dysfunction are: (1) SBP≤90 mm Hg or MAP≤70 mm Hg or SBP≤40 mm Hg for at least 1 hour despite fluid resuscitation, adequate intravascular volume status, or use of vasopressor in an attempt to maintain systolic BP≥90 or MAP≥70 mm Hg; (2) P/F Ratio ≤250 in the presence of other dysfunctional organs or systems, or ≤200 if the lung is the only dysfunctional organ; (3) acute rise in creatinine >171 mM or urine output <0.5 ml/kg body weight for 1 hour despite adequate fluid resuscitation; (4) unexplained metabolic acidosis (pH≤7.30 or base deficit ≥5 with lactate >1.5 times the upper limit of normal; and (5) platelet count <50,000 or a 50% drop over the 3 days prior to ICU admission. The inclusion criteria for septic shock are the same as those for sepsis except that the patient must be on vasopressors within the previous 24 hours. Patients were excluded if they were <18 years old, were pregnant or breastfeeding, or were receiving palliative care only.
To meet the inclusion criteria for non-sepsis, patients must have been classified as: (A) patients with multiple trauma with an episode of shock who were expected to remain in the ICU for ≥72 hours (shock must have been present within the previous 24 hours and may have resolved at the time of enrolment). Shock is defined as SBP≤90 or MAP≤70 mm Hg or SBP<40 from baseline, or lactate >1.5 times the upper limit of normal, or other evidence of acute organ dysfunction; or (B) critically ill patients who were expected to remain in the ICU for ≥72 hours (e.g. intracerebral hemorrhage, subarachnoid hemorrhage, subdural hemorrhage); or (C) patients with non-septic shock (e.g. cardiogenic shock, hypovolemia, heat shock, burns requiring mechanical ventilation, pulmonary embolism, abdominal aortic aneurysm) who were expected to remain in the ICU for ≥72 hours (shock must have been present within the previous 24 hours and may have resolved at the time of enrolment).
Baseline characteristics include demographic information, organ function, pre-existing chronic conditions, sites of infection, types of infection, APACHE II score, and use of vasopressor/inotropes. Daily data included microbiologic culture results, organ function, hematologic and other laboratory tests, and type and quantity of resuscitation fluid.
In the septic patients, 88% of the admissions were medical, 94% of the patients required mechanical ventilation, and 67% required vasopressors or inotropes. The main site of infection was the lung accounting for 42% of the patients. The 28-day mortality rate was 23.5%.
In the non-septic patients, 75% of the admissions were medical, 90% of the patients required mechanical ventilation, and 43% required vasopressors or inotropes. The 28-day mortality rate was 18.3%.
The patient blood samples were collected within 24 hours of meeting the inclusion criteria for severe sepsis. Blood samples and clinical data were obtained at baseline, then daily for the first week, followed by once a week for the duration of the patients' stay in the ICU. The blood was processed within two hours of blood collection. Briefly, blood (10 ml each) was collected from existing arterial or venous lines (or by venipuncture with a 20-gauge needle) into Becton Dickinson buffered sodium citrate vacutainer tubes (0.105M trisodium citrate). The blood was centrifuged at 1,500×g for 10 min at 20° C., and the plasma was stored as 200 uL aliquots at −80° C. and thawed at the time of assays.
Plasma samples were obtained from 33 healthy adult volunteers who were not receiving any medication at the time of blood sampling. No attempt to match cases and controls was made.
Levels of the six indicators were determined as follows:
Lactate and creatinine were measured via enzymatic digestion using commercially available assays, namely, the Lactic Acid assay (lactic acid conversion to pyruvate and hydrogen peroxide by lactate oxidase) run on the ARCHITECT cSystem by Abbott, and the Creatinine assay (Kinetic Alkaline Picrate: creatinine reaction with picrate to form a creatinine-picrate complex at an alkaline pH) run on Abbott's ARCHITECT c Systems and AEROSET System.
A hematology analyzer (cell counter) was used to measure platelet count.
In this study, cfDNA was isolated from 200 μL of plasma using the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, Calif.). The concentration of the DNA was measured by UV absorbance at 260 nm using a spectrophotometer (BioPhotometer Plus spectrophotometer, Eppendorf, Mississauga, ON). The purity of the DNA was confirmed by determining the OD260/OD280 ratio.
Plasma levels of protein C antigen were quantified by an enzyme immunoassay (Affinity Biologicals Inc., Ancaster, ON).
The GCS was measured at the bedside.
Statistical analyses: The mortality risks within 28 days since admission were assessed to formulate a multivariate model in the following way. Using a day as the unit of time, a longitudinal approach was used that follows the daily life of each patient until (1) death in ICU or hospital, or (2) discharge from hospital, or (3) the date of censoring on day 28 since ICU admission. Let Hit be the daily hazard of dying of the ith patient on day t. The model linking Hit to the explanatory variables is:
H
it
=e
β
+β′X
for i=1,2, . . . n and t=1,2, . . . Ti (1)
where 130 is an unknown intercept, β′ is a row vector of unknown coefficients; Xit is a column vector of the explanatory variables that reflect the relevant information of the ith patient up to day t; n is the number of patients in the sample; and Ti is the day when the ith patient died in ICU/hospital, was discharged, or was censored (day 28). A discharge is defined as the transfer of a live patient from the ICU or hospital to home or other institution where the information on mortality status was no longer collected. Since each live patient is censored on day 28, both the day of death and the day of discharge are ≤28. Implicit in this formulation is the simplifying assumption that the hazard remains constant through all time points within each day. For simplicity, t is called the “current day” and Ti is called the “last day” of the ith patient since admission.
For each of the six time-varying biological indicators (TVBIs), the following three analytical variables were defined: (I) the day 1 variable, which assumes the same day 1 value of the indicator for all t; (2) the current variable, which in its simple form assumes the observed (directly observed or imputed) value of the indicator on day t; and (3) the change variable, which is defined as the day 1 variable minus the current variable. Any value of a current variable that is not directly observed was imputed as follows. If the day in question is preceded by at least one day with directly observed value and is followed by at least one day with directly observed value, then it is linearly interpolated from the two closest observed values. Otherwise, it is set to be equal to the nearest observed value.
To reduce the risk of making misleading inferences from observational data, the model includes the following explanatory variables for representing the relevant context in both specifications of the CLOGLOG model. First, two dummy variables were used to represent the presence or absence of the preconditions of chronic lung disease and previous brain injury. Second, age was used to represent the demographic background. Third, the duration of stay and its natural log transformation were used to represent the temporal pattern of the hazard that resulted from a balance of several processes (e.g. the death process that tended to remove relatively sick patients from the sample, and the discharge process that tended to remove relatively healthy patients). Duration and log(duration) were used as two of the explanatory variables (1) to help capture the temporal pattern of the hazard of dying and (2) to prevent selection biases from resulting in misleading findings. Mathematically, this specification of the time function expresses the dependence of hazard on duration as a product of an exponential function and a power function. It has the advantage of being highly flexible in reflecting the temporal pattern in the data. Using duration and log(duration) as the only two explanatory variables in the CLOGLOG model, the estimated function Ĥt=e−4.4855-0.0755t(t0.4548), where t is the duration of stay and Ĥt is the estimated hazard of dying on day t was obtained. This function is represented by the smooth grey curve in
P
it=1−e−H
Let Yit be a dummy variable that assumes the value of 1 if the ith person died on day t. The unknown coefficients are then estimated by maximizing the following log-likelihood function:
To carry out the estimation, the Logistic procedure of SAS with the option of LINK=CLOGLOG was used (Allison, P. D. Survival Analysis Using SAS, 2010, pp. 240-247. Note that Eq. (2) can be rewritten as:
ln(−ln(1−Pit))=β0+Xit (4)
Since the left-hand-side is called a complementary log-log function of Pit, this model is called a complementary log-log (CLOGLOG) model. The CLOLOG model is similar to but more versatile in yielding longitudinal insights than the Cox proportional hazards model, as the CLOGLOG model is free from the restriction of the proportional hazards assumption and is capable of generating the predicted probability of dying on any day or in any time interval. The similarity is in the expression of the dependent variable (the daily hazard of dying) as an exponential function of explanatory variables. The versatility derives from the ease of including a large number of time-varying variables and the replacement of the maximum partial-likelihood method by the maximum likelihood method for estimation. The maximum partial-likelihood method does not represent the removed time-dependent part of the model by an unknown constant and hence does not generate an estimated intercept, which is needed for computing the predicted hazard that is to be translated into easily interpretable probabilities. Instead of starting with Eq. (4), in which the model may be considered as a discrete-time model, formulation of the model with Eq. (1) is a continuous-time model. It is easier to see from Eq. (1) that the exponential transformation of the kth element of β is the hazard ratio for the kth explanatory variable.
For several reasons, the CLOGLOG model is preferred over the conventional logistic model of the form:
where Pi is the ith patient's probability of dying in 28 days. First, for these data, the latter involves the unrealistic assumption that none of the discharged patients died, whereas the former does not. Second, the former can reveal the temporal pattern of the risk of dying, whereas the latter cannot. Third, for the assessment of treatment effect, the latter often yields spurious findings, whereas the former does not. For example, CLOGLOG model demonstrates that clinicians had a strong tendency to apply vasopressors to sicker patients so that the dummy variable representing the application of vasopressors had a very high hazard ratio (HR) of 5.6, and that the beneficial effect of vasopressors in reducing mortality risk became statistically significant after the initial week, causing the HR to decrease sharply to 1.3. In contrast, the conventional logistic model yielded an odds ratio of 2.4 for the dummy variable, incorrectly suggesting that the application of vasopressors resulted in worse mortality outcomes. Note that in both models, age and the preconditions of chronic lung disease and previous brain injury were used as contextual variables, and that in the CLOGLOG model, duration and log(duration) were used as additional contextual variables to represent the overall temporal pattern of the hazard.
A more useful version of the logistic model is of the form:
where Pit is the ith patient's probability of dying on day t, conditional on surviving to the beginning of day t. If the explanatory variables are skillfully specified, and if the additive components of the predicted “logit” (log of odds) are also skillfully used, then the qualitative insights obtained from it will be very similar to those obtained by the CLOGLOG model and can be considered its discrete-time analogue. Due to its usefulness for longitudinal data analysis, the name Longitudinal Logit Model to this version, which is analogous to the CLOGLOG model, is offered.
The model was applied to the septic and the non-septic groups separately. The observations in the input data file for each group are the daily records of all patients with observed values for the explanatory variables. The number of observations for each patient who died in ICU or hospital is equal to the number of days from admission to the day of death, whereas the number of observations for each patient who was discharged is equal to the number of days from admission to the day of discharge. Each of the censored patients contributes 28 observations. The input data matrix has a simple structure. Each row represents a person-day, in which the information of all explanatory variables is used to enhance the likelihood of the value of the outcome variable (Yit). The original data file for all 392 septic patients contained 7,298 observations (rows).
Since the Logistic procedure of SAS does not compute the 95% confidence interval for AUC (the area under the curve showing the relationship between sensitivity and 1-specificity in the ROC analysis), a SAS module to carry out the computation was written. In this module, the standard deviation of the AUC was computed according to the algorithm developed by Hanley and McNeil (Radiology. 1982; 143:29-36). To better reflect the effects of the sample size and the number of unknown coefficients on the width of the confidence interval, the critical value for constructing the confidence interval from a t-distribution was taken rather than the standard normal distribution.
Since the daily probability of dying considered herein is a conditional probability, the probability of dying in 28 days should not be computed by adding up 28 daily probabilities of dying. The probability of dying in 28 days implied by the daily hazard of dying H is 1−e−28H.
For the prognosis of individual patients, the present method can generate the threshold probabilities (1) for any chosen levels of sensitivity, specificity, PPV, and NPV, (2) for the best weighted sum of these desirable but conflicting measures, and (3) for the best balance between sensitivity and specificity or between PPV and NPV. This capability originates from the computer algorithm that generates the set of the predicted probabilities of dying within 28 days (or any reasonable duration) for all septic patients, from which a detailed list of threshold probabilities was used to compute the values of these four measures.
From the records of 355 septic patients and 288 non-septic patients with non-missing values, the input data sets of the two groups of patients for the CLOGLOG model has 6,712 and 4,950 observations (person-days), respectively. Table 1 shows the estimated results of the day 1 specification of the CLOGLOG model (Panel A for septic patients and Panel B for non-septic patients).
For both groups of patients, the signs of the estimated coefficients of the variables representing all 6 indicators turned out to be physiologically sensible (i.e. positive for cfDNA, lactate, and creatinine, and negative for protein C, platelets, and GCS). With respect to the temporal pattern, the signs of the estimated coefficients of “Log(Duration of Stay)” and “Duration of Stay” turned out to be positive and negative, respectively, implying that the hazard of dying increased first and then declined. It is likely that the increase during the first few days reflected the selection bias resulting from removing patients who were expected to die within the first 72 hours. The decrease partly reflects the cumulative benefits of the treatments and care received by the patients in IUC and hospital. The area under the curve in ROC analysis was 0.891 (CI: 0.850 to 0.932) for septic patients and 0.936 (CI: 0.891-0.982) for non-septic patients. Note that the values of hazard ratio, which can be used to assess the relative importance of the explanatory variables, are based on the assumption that all time-varying indicators are comparable after being divided by their respective standard deviations. Also note that protein C and platelets did not have significant effects for non-septic patients.
Table 2 shows the estimated results of the current specification of the CLOGLOG model (Panel A for septic patients and panel B for non-septic patients). The main difference from the results of the day 1 specification of the model was the reduction in the number of change variables with significant coefficients for both septic and non-septic patients. This difference suggests that the current variables retained most of the predictive powers of not only the day 1 variables but also the change variables. In other words, using the most up-to-date information of the 6 indicators largely removed the need to pay attention to the changes from day 1. For septic patients, only the change variables of GCS and creatinine remained in the model. For non-septic patients, none of the change variables remained. The area under the curve in ROC analysis was 0.886 (CI: 0.844 to 0.929) for septic patients and 0.940 (CI: 0.896-0.984) for non-septic patients.
As shown in Table 3, this risk assessment tool calculates the mortality hazard and probabilities of dying for a septic patient who died on day 11 (based on the estimated coefficients of the current specification of the CLOGLOG model for septic patients). For this patient, the predicated probability of dying on day 11 and within 28 days is 12% and 97%, respectively.
A small flaw in the SAS program for imputation was fixed, resulting in the addition of one more septic patient without missing values for the explanatory variables. Consequently, the number of observations of the input data matrix was increased to 6,724 from 356 patients. The number of patients with missing values was: 2 patients for cfDNA, 3 patients for protein C, 1 patient each for platelets and creatinine, and 32 patients for lactate. The proportion of non-survivors remained at 24% after the removal of the patients with missing values. Table 4 shows the estimation results of the day 1 and current specifications of the CLOGLOG model for these 356 patients without missing values. The day 1 specification combines the day 1 variable with the change variable whereas the current specification combines the current variable with the change variable. In both specifications, the level and/or change variables of three TVBIs (cfDNA, lactate, and creatinine) have positive estimated coefficients, indicating that higher values of these variables are associated with greater hazards of dying. In contrast, the estimated coefficients for the corresponding variables of protein C, platelets, and GCS are negative, indicating the opposite association with the hazard of dying. The estimated coefficients of two preconditions (chronic lung disease and previous brain injury) as well as age were also positive, suggesting that the presence of these preconditions as well as advanced age increase the hazard of dying.
Although the p-values associated with the variable “Duration” in Table 4 were somewhat large (0.106 in the day 1 specification and 0.125 in the current specification), it was chosen not to set its coefficient to 0 for the following reason. Since “Log(Duration)” is a monotonically increasing function of “Duration”, they must be positively correlated. This correlation contributed to the inflation of the standard error of the estimated coefficient of “Duration” and hence the inflation of the p-value. The dotted dark and light grey curves in
Keeping selection biases in mind, it is not surprising that the sharp rise in the hazard from day 1 to day 5 is inconsistent with the fact that the daily averages of five of the six TVBIs improved during the same time interval: cfDNA decreased by 3.8%, protein C increased by 25.5%, creatinine decreased by 19.4%, GCS increased by 18.0%, and lactate decreased by 30.7%. This inconsistency was resolved in the CLOGLOG model by the positive estimated coefficient of “log(duration)”. It is noted that the omission of both duration and log(duration) from the Day 1 specification of the model led to the effects of most TVBIs being underestimated or, in the case of the day 1 variable of lactate, even non-significant (p=0.221). This finding reveals a shortcoming of the partial-likelihood approach commonly adopted by users of the Cox model: as a consequence of removing any function of time, the estimated effects of some TVBIs became misleading. In other words, the partial-likelihood approach is associated with a high risk of making misleading inferences from observational data which are subject to various selection biases. This finding also reveals that the above-mentioned eligibility criterion should not be used any more. It is worth noting that among the patients contributing to the input data, one died on day 1, 4 died on day 2, and 8 died on day 3, suggesting that some clinicians realized that this criterion had the undesirable effect of resulting in not only selection bias but also loss of valuable information and hence chose not to use it.
Since the dependent variable in this CLOGLOG model is the daily hazard of dying, the logistic procedure of SAS automatically uses the predicted daily probabilities of dying for all records (person-days) to conduct the ROC analysis and compute the value of AUC. In other words, the value of AUC was computed by using the daily probability of dying as the classifier. For the day 1 and current specifications of the CLOGLOG model, it yielded the values of 0.865 (95% CI, 0.826-0.903) and 0.866 (95% CI, 0.828-0.904) for AUC. These face values should not be compared with the AUC values computed in other studies that used a logistic model with the dependent variable being the probability of dying in 28 days because in those studies the probability of dying in 28 days was used as the classifier. Simulations that compared the AUC values computed by the two classifiers (the daily probability of dying versus the probability of dying in 28 days) were conducted for the same set of the daily hazards of dying. It was found that the former was markedly smaller than the latter in most cases. An explanation for this large difference is that predicting whether a patient would die on a given day is more difficult than predicting whether a patient would die in 28 days. The simulations also indicate that the difference in the face values of AUC between the two classifiers tends to become larger when the level of hazards is raised and when the predictive power of the model becomes stronger.
To make the face values of AUC comparable to those in other studies, the AUC was recomputed in the following way. From the large input file of person-day records, the record of the last day for each patient was selected. For each patient, the selected record was used to compute his/her predicted daily hazard of dying, based the estimated coefficients of the CLOGLOG model. The predicted hazards were then transformed into the predicted probabilities of dying in 28 days by the formula 1−e−28H
To enhance the model's predictive power and to obtain biologically meaningful insights in regard to changes in TVBI values, the day 1 as well as the current variables of protein C, platelet count, and creatinine were log-transformed as their effects on mortality differences among patients tend to become negligible for patients with scores higher than the normal level. Mathematically, this transformation implies the replacement of an exponential function by a power function in the dependence of the hazard on the level variable in question. Since power function has a flatter tail than does exponential function, the negligible effects are better represented but the former.
The model's predictive power was further enhanced by replacing the simple difference between current and day 1 variables of some TVBIs with proportional changes. Let X1 be the day 1 variable, and Xt be the current variable of the TVBI in question. This alternative specification of its change variable is
is the proportional change. Mathematically, the switch to this alternative implies the replacement of the exponential function eβΔX by the power function (1+ΔX/X1)β, where β is an unknown coefficient to be estimated. In this model, the change factor is more suitable than the simple change for representing the change variables of protein C, platelets, creatinine, and lactate. In other words, proportional change is better than simple change for quantifying the change effects of these 4 TVBIs.
An important methodological issue is whether the level variable of any TVBI had a non-monotonic effect. For example, in the construction of various versions of APACHE, several variables such as body temperature were assumed to have non-monotonic effects. To deal with this issue, by expanding from the best day 1 specification of the CLOGLOG model reported in Table 4, the level effect of each TVBI was quantified by two variables simultaneously: its day 1 variable and the log of the day 1 variable. The combination of these two variables provided a flexibility to allow the data to decide whether the TVBI in question had a non-monotonic effect, which was to be revealed by the two variables having coefficients with opposite signs. This was done for each TVBI in turn. The signs and p-values associated with the two variables were: same sign for cfDNA, with p=0.28 and 0.34; same sign for protein C, with p=0.47 and 0.21; opposite signs for platelets, with p=0.71 and 0.10; same sign for GCS, with p=0.41 and 0.85; opposite signs for lactate, with p=0.21 and 0.90. Thus, it can be inferred that within the observed data range, none of the TVBIs had a significant non-monotonic effect.
The finding that the estimated coefficients of the day 1 and current variables of creatinine are not significantly different from 0 warrants explanation. This finding was mainly due to the overlap of their weaker predictive powers with the stronger predictive powers of the day 1 and current variables of protein C and platelets. Panel 1 of Table 5 shows that in the context of the preconditions of chronic lung disease and previous brain injury, age, and duration, the day 1 variable of creatinine, with p=0.1002, did not have a significant effect on the mortality hazard, although its estimated coefficient had the biologically meaningful positive sign. This finding is misleading because the day 1 and change variables of creatinine had a strong negative correlation (r=−0.57, N=6,724, p<0.001). This negative correlation implied that large improvements in creatinine occurred mostly to patients whose initial values were relatively poor (high), so that the pool of survivors contained increasingly higher proportion of patients whose creatinine was relatively poor on day 1 but had experienced large improvement (reduction) afterwards. To control for the distorting effect of this selective improvement, the proper assessment of the effect of the day 1 variable required the simultaneous inclusion of the corresponding change variable into the model. Since strong negative correlation between day 1 and change variables occurred to all six TVBIs, the use of the day 1 variable of each TVBI should be accompanied by the corresponding change variable. Otherwise, the effect of the day 1 variable will be understated. Panel 2 of Table 5 shows that the addition of the change variable of creatinine not only raised the AUC markedly from 0.623 to 0.693 but also helped make the corresponding day 1 variable highly significant (p=0.0036) and caused its estimated coefficient to increase markedly from 0.2345 to 0.4320. Thus, controlling for the selective improvement, patients with relatively high day 1 creatinine were found to be at higher risk of dying. However, panel 3 of Table 5 shows that the further addition of the day 1 and change variables of protein C and platelets caused the day 1 variable of creatinine to become a non-significant explanatory variable (coefficient=0.1593, p=0.3231). This was the consequence of the overlap between the weaker predictive power of the day 1 variable of creatinine and the stronger predictive powers of the day 1 variables of protein C and platelets. Behind this overlap were the significant correlations of the day 1 variable of creatinine with the corresponding variables of protein C and platelets. The two correlations were −0.155 (N=6,724, p<0.0001) for protein C and −0.202 (N=6,724, p<0.0001) for platelets. In short, the loss of usefulness of the day 1 variable of creatinine resulted from a multicollinearity problem.
However, it is useful to remember from Panel 2 that a strong correlation between explanatory variables need not result in a multicollinearity problem.
An important feature of the current variable of a TVBI is that it could inherit the predictive powers of the corresponding day 1 and change variables. In Panel 1 of Table 6 where the current variable of creatinine was included in the CLOGLOG model with the same set of contextual variables, it had a highly significant effect (coefficient=0.5673, Chi-square=17.2, p<0.0001), as expected. In Panel 2 of Table 6 where the change variable of creatinine was added to the model, the effect of the current variable of creatinine was weakened (coefficient-0.4320, Chi-square=8.5, p=0.0036), because part of its inherited predictive power was taken back by the change variable. The correlation between the current and change variables of creatinine was 0.351 (p<0.0001). In Panel 3 of Table 6 where the current variables of protein C and platelets were further added to the model, the effect of the current variable of creatinine was markedly reduced and became non-significant (coefficient-0.1560, Chi-square-1.0, p=0.3245), because its weaker predictive power overlapped to a large extent with the stronger predictive powers of the current variables of protein C and platelets. The correlation of the former variable with the two latter variables were −0.163 (p<0.0001) and −0.300 (p<0.0001), respectively. In short, the current variable of creatinine lost its usefulness after encountering the multicollinearity problem twice.
The hazard ratios computed from the estimated coefficients by exponentiation are not suitable for comparing the relative importance of the explanatory variables because the explanatory variables do not share a common unit. To overcome the comparability problem, a new way for assessing the relative importance of the explanatory variables was introduced by transforming the CLOGLOG model into the following form:
log(Hit)=β0+β1Xi1t+β2Xi2t+ . . . +βkXikt+ . . . (7)
where βk is the kth element of β′ and XiKt is the kth element of Xit. Despite the fact that the explanatory variables have different physical units, the additive terms on the right-hand-side of Eq. (7) have a common unit, log(1/day), so that their magnitudes can be used to evaluate the relative importance among the explanatory variables in determining the log of hazard. Thus, βkXikt is called the additive contribution to the log of hazard of dying by the kth explanatory variable.
The additive contributions to two representative log of hazards was then computed: (1) the predicted log of hazard computed for the mean of the non-survivor group, and (2) the predicted log of hazard computed for the mean of the survivor group. For each explanatory variable, the difference in its additive contributions to these two representative log of hazards is then considered as its predictive power in distinguishing non-survivors from survivors: the greater the difference, the greater the predictive power.
For the day 1 specification of the model, Table 7 demonstrates the computation of the difference in the additive contributions to the log of hazard between (1) the mean of non-survivors and (2) the mean of survivors for each explanatory variable. For example, the means of cfDNA on day 1 were 6.126 μg/mL for non-survivors and 4.705 μg/mL for survivors. By multiplying these two means by the estimated coefficient of 0.1857, it was found that the additive contributions of the day 1 variable of cfDNA to the log of hazard were 1.138 for non-survivors and 0.874 for survivors. Hence, the predictive power of the day 1 variable of cfDNA was 1.138−0.874=0.264, which is shown in the last column of the table. Such computations were done for all explanatory variables in the table. From the last column, with respect to day 1 variables, cfDNA had the greatest predictive power (0.246). With respect to change variables, GCS had the greatest predictive power (0.701).
The combined predictive power of the day 1 and change variables of each TVBI was computed by summing the predictive powers of its day 1 and change variables. For example, in the day 1 specification of the model, the combined predictive power of cfDNA was 0.264+0.028=0.292, whereas the combined predictive power of GCS was 0.021+0.701=0.722. In other words, GCS was much more powerful than cfDNA in distinguishing non-survivors from survivors. The predictive power of GCS came mostly from its change, whereas the predictive power of cfDNA came mostly from its initial level. For the current specification of the model, Table 8 shows similar computations for evaluating the relative importance of the TVBIs in terms of their current and change variables. Although most change variables in the current specification had zero contribution to the predictive power, the change variable of lactate, with a predictive power of 0.360, remained important. Since this predictive power was greater than that of its current variable (0.200), recent history was more important than current status for lactate. The combined predictive power of each TVBI in the current specification of the model was also computed. The patterns of the predictive powers of the six TVBIs in two specifications of CLOGLOG model turned out to be similar (
As shown in
The information in the last columns of Tables 7 and 8 was used to assess the relative importance of the day 1 or current variable against the corresponding change variable for each TVBI and create the top panel of
One temporal attribute shared by all six TVBI was that the day 1 variable had a strong negative correlation with the corresponding change variable (r=−0.80 for lactate, −0.70 for GCS, −0.58 for cfDNA, −0.57 for creatinine, −0.36 for platelets, and −0.27 for protein C; N=6,724). This attribute suggested that sicker patients on day 1 tended to benefit more from treatments and to experience greater improvement. Keeping this general attribute in mind, for gaining insights into the temporal pattern of each TVBI, the septic patients were divided into four quartile groups in terms of the day 1 variable, and then examined the trend of the daily averages of each group. To avoid the selection bias resulting from the death process that could misleadingly exaggerate improvements as the sickest patients in each group were successively removed, the daily records of all non-survivors were removed from the data before the daily averages were calculated. Except for the less clear evidence for platelets, the worst quartile group experienced the greatest improvement for each TVBI. To see the effects of removing non-survivors, the temporal graphs in
From the differences in predictive powers and temporal patterns, it is inferred that current management strategies produce a rapid improvement in some TVBIs (e.g. lactate, GCS) which contribute to a reduction in ICU mortality. However, the levels of some TVBIs do not change significantly over time (e.g. cfDNA, protein C), suggesting that therapeutic strategies to reduce cfDNA levels or restore protein C levels warrant further investigations. Being the two TVBIs with the greatest predictive powers, GCS and lactate turned out to show the greatest improvements: the average of the worse quartile group experienced a sharp and rapid improvement, and the averages of all quartile groups converged to a narrow range around a low risk level. This temporal pattern is consistent with the finding that most of the predictive powers of GCS and lactate (97% and 88%) come from their change variables. This finding suggests that the treatments that helped improve GCS and lactate contributed greatly to the reduction of the mortality risk level. For cfDNA, the average of the worst quartile group remained at the high risk level of 6 ug/mL, after a brief improvement from day 1 to day 4, while the average of the best quartile group increased from less than 3 ug/mL to nearly 4 ug/mL. This temporal pattern corresponded to the finding that a very high proportion of the predictive power (91%) of cfDNA came from its day 1 variable, leaving only 9% for its change variable. This finding suggests that novel strategies for reducing cfDNA can make an important contribution to improving mortality outcome.
Protein C was the third most powerful predictor of the hazard of dying, with 45% of its predictive power coming from its change variable. There was a general pattern of improvement in protein C for all quartile groups, with the worst quartile group experiencing the greatest improvement. However, by day 28, the gap between the worst and best quartile groups remained quite large, with the average of the worst quartile group being <70% of the normal level. Compared with lactate and GCS, the improvement in protein C was smaller and more prolonged, so that its contribution to the overall reduction of mortality risk was much less. This temporal pattern corresponded to the finding that less than half of the predictive power of protein C came from its change variable. This finding suggests that increasing protein C can also make an important contribution to improving mortality outcome.
Platelet counts had the second weakest predictive power, with 54% of it coming from the change variable. It has the distinctive feature of a general lack of improvement during the first few days. During the first 3 or 4 days, the average of the worst quartile group remained at the same high risk level of about 100 units, whereas the averages of the 3 better quartile groups all worsened. Beyond day 4, the averages of all quartile groups experienced a prolonged and moderate improvement until around the end of the second week. By day 28, the gap between the worst quartile group (175) and the best quartile group (350) remained large.
Creatinine, the weakest predictor, had all of its predictive power coming from its change variable. The average creatinine of the worst quartile group experienced an improvement from 350 μmol/L on day 1 to 225 μmol/L on day 6. However, with almost no further improvement, the gap between the worst quartile group (220 μmol/L) and the best quartile group (50 μmol/L) remained large until day 28. In summary, the limited improvement in creatinine made a small contribution to the overall reduction in mortality risk.
In the current specification, it was found that the current variable accounted for 100% of the predictive powers of cfDNA, protein C, platelets, and GCS, and that the change variable accounted for 64% of lactate's predictive power, which was less than the 88% in the day 1 specification of the model. These findings implied that for most TVBIs, the predictive powers of the day 1 and change variables are inherited by the corresponding current variables. Note that the reason for the change variable to represent 100% of creatinine's predictive power in both specifications was that the weaker predictive powers of the day 1 and change variables of creatinine overlapped with the stronger predictive powers of the day 1 and change variables of protein C and platelets.
Using the input data from the original 355 septic patients, the relative importance of the 6 indicators is shown in Table 9. In terms of the difference in the contributions to the log of hazard between the non-survivors and survivors by the current indicators, GCS (0.655), platelets (0.518), cfDNA (0.335), and protein C (0.343) were more important than lactate (0.201) and creatinine (0) for septic patients. In contrast, GCS (1.932) was the most important indicator for non-septic patients, followed by cfDNA (0.433), lactate (0.159) and creatinine (0.112). An advantage of this additive index over the hazard ratio is that it can reflect the combined effect of the current and change variables for each TVBI. Such combined effects are shown in
Using the day 1 variables from a subset of the 6 indicators, namely protein C, lactate, and creatinine, a binomial logit model may be applied for assigning patients into septic or non-septic groups. The septic and non-septic data was pooled and a binomial logit model used in which the dependent variable is the probability that a patient is septic, and the explanatory variables are the day 1 variables of protein C, lactate, and creatinine. It was found that all estimated coefficients were significantly different from zero, and that their joint predictive power was moderately high, with AUC=0.67 (CI: 0.63 to 0.71). In general, patients with lower protein C, lower lactate, and higher creatinine were more likely to be septic patients.
For septic patients, the threshold probabilities of dying within 28 days for achieving some chosen objectives about sensitivity, specificity, PPV (Positive Predictive Value), and NPV (Negative Predictive Value) are shown in Table 10. For example, the threshold probability of 0.227 can achieve the best balance between sensitivity and specificity at 0.82, whereas the threshold probability of 0.632 can achieve the best balance between PPV and NPV at 0.86.
The predictive power of this risk assessment tool was greater than those of APACHE II, MODS, or SOFA. For both MODS and SOFA, the day 1 and change variables were created to assess the predictive power of each of them in the CLOGLOG model that contained the same set of contextual variables as in this model. For APACHE II, only the day 1 variable was created because APACHE II was designed to measure disease severity within 24 hours of ICU admission. As shown in Table 11, the values of AUC are 0.802 (95% CI, 0.746-0.858) for MODS, 0.862 (95% CI, 0.817-0.907) for SOFA, and 0.774 (CI: 0.723-0.826) for APACHE II. All three were lower than the AUC achieved by the day 1 specification of the present model: 0.903 (95% CI, 0.864-0.941). Although the 6 components in MODS and in SOFA are similar, SOFA performed markedly better than did MODS. The better performance of SOFA over MODS probably reflects the fact that one of the TVBIs used in constructing SOFA involved choices of different treatments for hypotension, with worse scores for higher dosages.
To explore the possibility that adding more TVBIs into this assessment tool can enhance its predictive power, 4 additional TVBIs were added separately into the Day 1 Specification of the CLOGLOG model shown in Table 4. Three of the additional TVBIs are the remaining components of the MODS score (bilirubin, PaO2/FiO2 ratio, pressure adjusted heart rate (PAR)). Neutrophil count was also examined since neutrophils are a potential source of circulating cfDNA. The day 1 and change variables of each TVBI were entered into the model as a pair because these two variables had a strong negative correlation for each TVBI. Table 12 shows that all the variables representing these four TVBIs had p-values greater than 0.20 and hence did not have significant effects on the hazard of dying. Thus, addition of more TVBIs to the tool does not increase the predictive power of the tool.
The assessment of the usefulness of additional contextual variables was conducted by inserting each variable separately into the day 1 specification of the CLOGLOG model shown in Table 4. From Table 12, it was found that among the preconditions, congestive heart failure had a p-value less than 0.05. Despite its small p-value, this was not in the present model for the following reasons. First, because its predictive power overlapped with that of age, its inclusion inflated the p-value of age from 0.0822 to 0.2604 so that age would be removed from the model. Second, its inclusion actually resulted in a slight reduction in the model's AUC. Based on the daily probability of dying as the classifier, the inclusion of the dummy variable representing congestive heart failure caused the AUC to decrease from 0.865 to 0.862. It is likely that for a larger sample size, congestive heart failure would have a significant effect on the hazard of dying. The precondition of liver disease had a p-value somewhat lower than 0.10, suggesting that it might have some effect on the hazard of dying. However, its coefficient was negative. Diabetes and chronic renal insufficiency also had negative but non-significant coefficient. It is not clear whether these preconditions had spurious life-saving effects, resulting from the medications for their treatments. Ischemic heart disease, chronic dialysis, and cancer had positive coefficients but their p-values were too large to be considered for the inclusion in the model.
With respect to gender, being female appears to be associated with an elevated hazard of dying. However, its p-value is 0.0782. So far, physiological reasons for female septic patients to have a higher mortality risk than their male counterparts have not been found. No gender-specific differences in the administration of treatment (e.g. use of mechanical ventilation, vasopressors/inotropes, fluids) or in the length of stay in the ICU were found.
From the raw data of the 356 septic patients who contributed information to the input data matrix of the CLOGLOG model, it was observed that some sites and types of positive cultures were associated with relatively high crude death rates. For example, the 53 patients whose site of positive cultures was urinary tract had a crude death rate of 34%, and the 72 patients whose type of positive cultures was mixed had a crude death rate of 31%, compared with the overall crude death rate of 24%. To determine if information on the sites and types of positive cultures would enhance the predictive power of the CLOGLOG model, each site or type of positive cultures was represented by a dummy variable and added into the day 1 specification of the CLOGLOG model reported in Table 4. In terms of the values of the AUC that was based on the daily probability of dying as the classifier, it was found that the addition of each of the dummy variables representing the 8 sites and 6 types of positive cultures had little effect on enhancing the model's predictive power (Panel 1 of Table 13). Paradoxically, the values of the AUC decreased as a consequence of adding each of 5 sites of positive cultures (pleural cavity, blood, peritoneal, skin, and “other”) into the CLOGLOG model. This finding reveals that the intuitively appealing AUC is not a completely consistent measure of predictive power. Here, the complete consistency of a measure is defined as the property that adding an explanatory variable into the model can never result in a worse value for the measure.
A completely consistent measure of predictive power for a CLOGLOG or logit model is the Rho-square, which is defined as 1−A/B, where A is the maximum log-likelihood of the model in question, and B is the maximum log-likelihood of the null model, which has the intercept as the only unknown coefficient. In panel 2 of Table 13, it can be seen that the addition of any of the dummy variables did not result in a decrease in the value of Rho-square. This complete consistency is the same as the complete consistency of the R-square for regression models. An important difference between them is that as demonstrated in Panel 2, a value of about 0.2 for Rho-square can represent a very high predictive power, whereas such a value for R-square usually indicates a low predictive power. To impose a penalty on adding explanatory variables that either contain mostly random noises or are highly redundant, the Rho-square is modified into the Adjusted Rho-square, which is 1−(A−k−1)/(B−1), where k is the number of explanatory variables. This is analogous to the Adjusted R-square for keeping regression models parsimonious. As seen in Panel 3 of Table E14, most of the sites and types of positive cultures contributed negatively to the Adjusted Rho-square and hence should not be added to the CLOGLOG model as these had little effect on its predictive power.
In addition to generating a predicted probability for each patient (as a survivor or non-survivor), this assessment tool can generate personalized mortality risk profiles that provide information about how different TVBIs affect a patient's risk of dying on any given day. To identify the main TVBIs that contribute to mortality risk and to determine how improvements in TVBIs can reduce the mortality risk of a patient in question, the construction and use of a mortality risk profile is described. As a basis for constructing a mortality risk profile, the best 10th percentile of survivor patients in terms of the predicted probability of dying as of the last day served as the benchmark for comparison.
Based on the estimated coefficients of the day 1 specification of the CLOGLOG model, the construction of the mortality risk profile is demonstrated for a 66-year old male patient who remained alive on day 28 (Patient A). The primary data of Patient A and the benchmark for constructing the mortality risk profile are shown in Table 14.
In Table 15, these data are then used to generate the values of all explanatory variables for both Patient A (in the 2nd numeric column) and the benchmark (in the 4th numeric column). For example, the value of the explanatory variable “Log(ProteinC_change)” for Patient A is generated from his day 1 and current values of protein C (33 and 47) as log(47/33)=0.35, where log is the natural log function. The remaining computations in the table are identical to those used in Table 7.
The elements in the mortality risk profile (last column of Table 15) are the predictive powers of the explanatory variables. For example, the predictive power of the day 1 variable of cfDNA is 0.34, which is this variable's contribution to [(the log of hazard of Patient A)−(the log of hazard of the benchmark)]. The greater the predictive power, the greater is the variable's ability to account for the predicted overall mortality gap between Patient A and the benchmark.
The bottom 5 elements of the last column of Table 15 are alternative measures of the overall mortality gap between Patient A and the benchmark. The overall difference in log of hazard (1.849) is the sum of the predictive powers of all explanatory variables. The HR representing the mortality gap is 6.4. The predicted probability of dying in 28 days (P28) is 13.3% for Patient A and 2.2% for the benchmark, so that the gap in P28 was 11.1%.
The elements in the last column of Table 15 are used to create the top graph in
By summing up the predictive powers of the day 1 and change variables of each TVBI, the combined predictive power of each TVBI was obtained as shown in the middle graph of
From this graph, it is seen that Patient A's improvements in GCS and lactate were not large enough to fully compensate for the initial disadvantage, although the improvement in his cfDNA was able to do so. For ease of communication, the exponential transformation to each bar in the middle graph was obtained to make it into a hazard ratio (HR). For ease of visualization, (HR−1) was plotted for each TVBI in the bottom graph of
Using the dynamic Excel version of Table 15, it was found that by increasing Patient A's protein C (47) to the level of the benchmark (131), his P28 could be decreased from 13.3% to 7.2%.
To demonstrate how the knowledge of the dynamic nature of the benchmark is useful in assisting the identification of the TVBIs that are a high priority for improvement, the mortality risk profile of another male patient (Patient B) who was discharged on day 12 was constructed. The primary data of Patient B and the benchmark are shown in Table 16, the computations are presented in Table 17, and his mortality risk profile is shown in
Partly as a consequence of having an advanced age of 79 and the precondition of chronic lung disease, his P28 was quite high (50%). The finding that among the 6 TVBIs, lactate had the highest HR of 2.1 suggests that lowering his lactate level would be most effective for reducing his mortality risk. However, a closer examination of Tables 16 and 17 revealed that Patient B started with a very low level of lactate (1.3 mmol/L) and maintained the same level on the day of discharge, whereas the benchmark started with a rather high lactate level (4.5) and experienced a large decrease (−3.2) to a low level (1.3) on the last day. Thus, the impressive improvement of the benchmark's lactate could not be replicated by Patient A, because his lactate level was already very low.
In
To validate the CLOGLOG model, the estimated coefficients of the day 1 specification of the model were used to predict the mortality outcomes of ICU patients from nine Canadian hospitals who were originally non-septic but later became septic in the ICU. Of the 33 such patients recruited from the same ICUs and during the sample time frame as the septic patients, 28 had non-missing values for all the explanatory variables and hence were used to form the validation group.
Before conducting the validation, the issue about what should be the definition of day 1: (1) the day of admission or (2) the day of becoming septic was considered. It was decided to conduct two validations: (V1) with the admission date as day 1 and (V2) with the day of becoming septic as day 1. For V2, the censored date was extended for any patient who was still alive on day 28 until he/she died, discharged, or remained alive on day 28 since becoming septic. Since there were 5 non-survivors in V1 and 6 non-survivors in V2, the proportion of non-survivors was 18% in V1 and 21% in V2.
For each patient in the validation group, the values of all explanatory variables as of her/his last day were first found and then her/his predicted hazard of dying was computed by inputting these values into the estimated day 1 specification of the CLOGLOG model. The predicted hazards of all 28 patients were then translated into the predicted probabilities of dying in 28 days for conducting an ROC analysis. The resulting AUC turned out to be quite high: 0.939 (95% CI, 0.845-1.000) for V1 and 0.886 (95% CI, 0.746-1.000) for V2, compared with 0.903 (95% CI, 0.864-0.941) for the derivation group of 356 patients. Since the validation group had a much smaller sample size than did the derivation group, the 95% confidence intervals of the AUCs for V1 and V2 were much wider than that for the derivation group. Despite the smaller sample size, the validation results were statistically supportive, because the lower limits of 0.845 and 0.746 were much higher than 0.5. The ROC curves for the V1 and V2 validations are compared with that of the derivation group in
In Table 18, the average values of the six TVBIs on day 1 and on the last day, as well as the change between the two dates, are shown separately for the non-survivors and survivors in V1 and V2. The contrasts between non-survivors and survivors were more consistent to the expected effects on the last day than on day 1 in both V1 and V2. The expectation was that for lactate, cfDNA, and creatinine, the means should be higher for the non-survivors than for the survivors, whereas for GCS, protein C, and platelets, the opposite would occur. With respect to the change from day 1 to the last day, the difference between non-survivors and survivors was consistent to the expectation for platelets, GCS, and lactate but inconsistent for cfDNA, protein C, and creatinine. In both V1 and V2, the consistent changes were greater than the inconsistent changes.
In terms of the contributions to the difference in log of mortality hazard between non-survivors and survivors, the contributions of lactate (1.30) and GCS (0.50) were much greater than those of platelets (0.14), cfDNA (0.07), protein C (0.06), and creatinine (−0.15) in V1. Similarly, the contributions of lactate (1.19) and GCS (0.56) were also much greater than those of platelets (0.17), cfDNA (0.00), protein C (−0.01), and creatinine (−0.16) in V2. Thus, the validation group confirmed that among the six TVBIs, lactate and GCS, mostly through their change effects, had the strongest predictive powers, irrespective of whether day 1 was defined as the date of admission or the day of becoming septic.
To demonstrate that the longitudinal logit (L-Logit) model specified in Eq. 5 could be used to reveal similar insights as those obtained via the CLOGLOG model, it was used to create the mortality risk profile of Patient A. With the input data matrix of the day 1 specification of the CLOGLOG model, the coefficients of the L-Logit model were estimated by simply replacing “LINK=CLOGLOG” by “LINK=LOGIT” in the Logistic procedure of SAS. With AUC=0.904 (95% CI, 0.865-0.942), its predictive power was about the same as that of the CLOGLOG model.
Analogous to the predicted log of hazard in the CLOGLOG model, the predicted logit in the L-Logit model can be decomposed into additive terms in the following way:
it=log(
where
A tool for assessing the mortality risk in septic patients has been developed. At the core of the tool was a CLOGLOG model that created a composite indicator from six TVBIs (cfDNA, protein C, lactate, GCS, platelets, and creatinine) and some contextual variables (age, presence of chronic lung disease or previous brain injury, length of stay). With the same set of contextual variables, the set of six TVBIs was stronger in predictive power (AUC-0.90) than not only APACHE II (AUC-0.77) but also MODS (AUC=0.80) and SOFA (AUC=0.86), both of which were also constructed from six TVBIs. The demonstration that both day 1 and change variables are important helps explain why APACHE II, which was based on the observed values of TVBIs in the first 24 hours, had the weakest predictive power.
There are several strengths to the present study. First, the way of formulating the CLOGLOG model made the model a versatile version of the Cox model for gaining longitudinal insights. The versatility comes from the removal of the restrictive assumption of proportional hazards and the replacement of the maximum partial-likelihood method by the maximum likelihood method for estimation. With the alternative estimation method, a flexible time function was used that helped overcome a distortion resulting from selection bias. Second, the present tool can generate mortality risk profiles that show the relative contributions of the six TVBIs to each patient's overall mortality risk. For example, a septic patient whose risk was mainly due to deficiencies in protein C may benefit from therapies that enhance the conversion of protein C to the anticoagulant activated protein C (APC). Potential therapies include ART-123, a recombinant thrombomodulin that enhances APC generation. In a phase Ilb clinical trial, ART-123 was shown to be safe and potentially efficacious in septic patients. Patients whose risks were mainly due to elevations in cfDNA may benefit from strategies that lower cfDNA. Administration of recombinant DNasel to septic animals has been shown to reduce circulating levels of DNA, suppress organ damage, and improve survival. Third, this is the first study that describes the use of the CLOGLOG model for predicting the probability of dying on any day or in any time interval in septic patients. Extensive exploration has also been conducted to see whether adding more TVBIs (such as bilirubin, PaO2/FiO2 ratio, pressure adjusted heart rate) or more contextual variables (such as types and sites of infection) could improve the model. The exploration confirmed that the present TVBIs provide a robust and concise model, and the statistical indicators for confidence turned out to be quite strong.
A tool to predict the mortality risk over time in septic patients and for generating personalized risk profiles has been developed and validated. This tool is based on a CLOGLOG model that takes advantage of the changing values of cfDNA, protein C, platelet count, creatinine, GCS, and lactate to achieve a high predictive power. The tool can help stratify patients who have similar clinical presentations but may respond differently to treatments due to patient-specific pathophysiology. The tool has utility for prognostic and predictive enrichment which may be leveraged to improve the success of clinical trials.
While the present application has been described with reference to examples, it is to be understood that the scope of the claims should not be limited by the embodiments set forth in the examples, but should be given the broadest interpretation consistent with the description as, a whole.
All publications, patents and patent applications are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety. Where a term in the present application is found to be defined differently in a document incorporated herein by reference, the definition provided herein is to serve as the definition for the term.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2018/050833 | 7/9/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62529767 | Jul 2017 | US |