The present invention is framed in the technological field of glucose prediction and control. Specifically, the invention is directed to a more precise monitorization of the glucose concentration values in humans.
An object of the present invention is to provide a method for enhancing glucose prediction which allows to control the glucose level more precisely by taking into account new control parameters.
A second object of the present invention is to provide a system for enhancing glucose prediction configured to carry out the method of the invention.
Diabetes represents one of the major health challenges of the 21st century. It is estimated that 463 million people worldwide had diabetes in 2019, and by 2045 this number may rise to 700 million. About 10% of people with diabetes suffer from type 1 diabetes (T1D), the most severe kind. T1D is characterized by autoimmune destruction of the β-cells in the pancreas, responsible for the secretion of the hormone insulin, an essential hormone needed to maintain blood glucose (BG) concentrations in a narrow range (70-140 mg/dL).
People with T1D need to replace endogenous insulin secretion with life-long subcutaneous exogenous insulin administration by multiple daily injections or continuous infusion via a portable pump, to avoid the detrimental effects of hyperglycemia. Good BG control can help diabetic patients to prevent or delay serious long-term complications of diabetes. Therefore, T1D patients must monitor their BG levels throughout the day and night to avoid BG problems such as hyperglycemia or hypoglycemia.
Nowadays, it is possible to continuously monitor the glucose level and its trends by using continuous glucose monitoring (CGM) systems. A CGM basically consists of a sensor measuring interstitial glucose levels, and transmitting information to an external display device able to provide alerts to the patient if the glucose level is reaching certain thresholds. In sensor pump integrated systems, CGM can also send the quasi-time-continuous readings to an insulin pump that can take actions depending on glucose levels.
The technological advances in diabetes treatment with the advent of CGMs have paved the way for a fully automatic treatment regime, automatic insulin delivery. It is then referred to as the “Artificial Pancreas” (AP) system, a closed-loop system where a control algorithm decides the proper insulin infusion in a continuous-time manner.
Therefore, the main idea through the AP system is to connect the CGM with the insulin pump via a control algorithm to compute the optimal insulin dose administered through an insulin pump to prevent T1D patients from having hypoglycemia or hyperglycemia and keep their BG concentration in an acceptable range (70-180 mg/dL), based on information received from a CGM every 5 minutes.
CGMs have also paved the way to decision support systems (DSS) when connected with an insulin pump or smart insulin pen, providing automated data analytics useful to the patient for a better diabetes self-control.
The ability to predict glucose along a given prediction horizon (PH), and estimation of future glucose trends, is the most important feature of any AP and DSS, in order to be able to take preventive actions to entirely avoid risk to the patient. Glucose prediction can appear in an AP as part of the control algorithm itself, such as in systems based on model predictive control (MPC) techniques, or as part of a monitoring system to avoid hypoglycemic episodes. In a DSS, glucose prediction allows to inform the patient about risks, predict hypoglycemic and hyperglycemic events, detect abnormal patient glucose responses, and take better informed decisions for improved glucose control. Therefore, high-reliability glucose prediction models have the potential to significantly improve diabetes management.
“Seasonal” stochastic time series models present regular patterns of changes and fluctuations that repeat periodically such as SARIMA: Seasonal AutoRegressive Integrated Moving Average, or SARIMAX: SARIMA including eXogenous variables. When trained from fixed-length clustered postprandial (after meal) glucose time series, they have exhibited relatively higher BG prediction accuracy for long term PHs and achieved both lowest root-mean-square error (RMSE) and mean absolute percentage error (MAPE) for different PHs when compared to other solutions, including nonlinear solutions, such as, recurrent neural networks and nonlinear regression.
Despite SARIMA and SARIMAX models have demonstrated previously their effectiveness for glucose prediction in long PHs with high accuracy, fixed-length time series requirements do not allow these methods to be applied in normal life (i.e. free-living conditions with the support of CGM data), which includes a variety of event-related periods (meals, nights, exercise, hypoglycemia treatments, etc.) with different and variable lengths.
The invention is related to a method for prediction of blood glucose based on seasonal local models. The method of the invention improves the accuracy of the blood glucose prediction by integrating a set of local models with a forced seasonality trained from clustered incomplete time series data.
Accurate predictions of blood glucose (BG) concentration with longer prediction horizon (PH) might improve type 1 diabetes therapy by allowing artificial pancreas (AP) and decision support systems (DSS) to adjust the therapy based on BG future values.
The method of the invention allows to predict blood glucose by integrating the predictions of a number of seasonal local models, “local” referring to each seasonal model representing a data set of similar glucose profiles observed along CGM historical data.
In the modeling step, the number of data sets and their corresponding glucose profiles characteristics (prototypes) are obtained by techniques for clustering of incomplete data (for instance, Partial Distance Strategy Fuzzy C-Means). Then, it is identified a seasonal model for each data set, for example using a Box-Jenkins methodology. Finally, the online BG prediction is obtained by local model integration through real-time membership-to-cluster estimation.
Firstly, raw historical data time series are provided, which comprises a record of measured blood glucose (BG) data from a continuous glucose monitor (CGM), and timestamps of main meal events. Optionally, additional data can be considered:
The method of the invention preprocesses original CGM historical data under free-living conditions (normal life) to obtain sets of similar glycemic profiles (clusters) useful for seasonal model identification of different event periods. This CGM data can be from a single patient, producing an individualized model where clusters characterize intra-patient variability, or from a population of patients, providing a population model where clusters characterize intra- and inter-patient variability.
For that, the raw CGM data time series and events timestamps are preprocessed for obtaining a set of event periods. For that, the raw data CGM time series are split into said event periods. An event period starts at the timestamp of said event and ends on the timestamp of the next event. In the particular event period after last meal in a day, the period event is split in two period events. Preferably, the first one having a fixed duration, then, starting the second one after said duration is finished and ending on the first event of the next day. This second period is used to represent a nocturnal period. The set of event periods can be divided into different subsets using event labels when available (for instance, subset 1 collecting breakfast, lunch and dinner event periods, subset 2 collecting night event periods, subset 3 collecting hypoglycemia treatments event periods, etc.). When available, exogenous inputs time series are also split in said subsets. In this case, each subset will be clustered separately.
Having being obtained the subset of event periods, length of said event periods is regularized. This is achieved by expanding fictitiously the event periods until the duration of each event period is equal to the duration of the period event having the maximum duration (s). For that, “not a number” (NaN) values are added after the last measured BG value and exogenous inputs (when available), producing time series with incomplete data.
Then, the event periods are clustered by a clustering algorithm for incomplete data, preferably using a Partial Distance Strategy Fuzzy C-Means clustering (PDSFCM) algorithm. Therefore, the events periods belongs partially to c clusters, represented by their prototypes (centroids), according to a fuzzy membership depending on the distance of said event periods to said cluster prototypes. Preferably, an optimal number of clusters (c) is obtained by using an index taking into account clusters cohesion and separation, such are Xie-Beni index (XBI) or Fukuyama-Sugeno index (FSI). In one embodiment, the clustering step is carried out by PDSFCM algorithm, which computes the cluster prototypes and membership values by minimizing the equation:
In Eq. 1, X={x1, x2, . . . , xn} denotes regularized-length event periods extracted from raw data time series, V=(v1, v2, . . . , vc) is a vector of unknown cluster prototypes vi∈p, d is a distance function representing the partial distance between the event periods (incomplete time series) and cluster prototypes and m ∈[1, ∞) is a fuzziness parameter, U=[uij] with i=1, 2, . . . , c, and, j=1, 2, . . . , n is a matrix of memberships wherein:
The clustering step does not affect modeling as far as it is a data preprocessing step, and this process cannot cause overfitting, i.e. the process does not introduce additional unnecessary parameters, which avoid the generalization of the model.
Then, a prediction model of the blood glucose is calculated, by modeling data in each cluster by a “local model” to simplify the glucose dynamic system, building for example a SARIMA (or SARIMAX if exogenous input time series are included) local model for each cluster, thus, obtaining a set of local models providing a set of glucose predictions.
The concept of seasonality has demonstrated successfully its accuracy in the BG prediction models for long PHs, but it is not directly applicable for the use in the normal life. Enforcing the concept of seasonality in normal life data then is very useful for developing BG prediction models or for detecting abnormal behaviors of people with diabetes.
First, for each cluster a time series is obtained by concatenating, time ordered, the event periods in said cluster, or a subset of them with a membership value (μ) to said cluster above a given threshold μmin=0. Glucose values previous to the event timestamp (hence referred as presampling data) are also saved in a number given by the maximum expected order of the AR component of the to-be-identified local models. Presampling data will serve to properly initialize the local models.
Each SARIMA model for each time series is defined by the equations:
G(t)=α+w(t) Eq. 2
ϕp(z−1)P(z−s)∇sD∇dw(t)=θq(z−1)θQ(z−s)ε(t) Eq. 3
In Eq. 2, G(t) is the glucose concentration at time t, α is a constant term, called intercept, ∇ is the backward difference operator, defined as:
∇w(t)=w(t)−w(t−1)
Also, d is the non-seasonal integration order of the time series, D is the seasonal integration order of the time series, ∇s is the seasonal backward difference operator, defined as:
∇sw(t)=w(t)−w(t−s)
Also, ε(t) is an stochastic error following a white noise process defined by:
ε(t)˜WN(0, σ2)
And ϕp(z−1),P(z−s), θq(z−1) and θQ(z−s) are polynomials in the lag (back-shift) operator z−1 of degree p, q, P, and Q, respectively, defined as:
ϕp(z−1)=1−ϕ1z−1−ϕ2z−2− . . . −ϕpz−p Eq. 4
P(z−s)=1−sz−s−
2sz−2s− . . . −
Psz−Ps Eq. 5
θp(z−1)=1+θ1z−1+θ2z−2+ . . . +θqz−q Eq. 6
θQ(z−s)=1+θsz−s+θ2sz−2s+ . . . +θQsz−Qs Eq. 7
When exogenous inputs are considered, each SARIMAX model for each cluster time series is defined by the equations:
where nx is the number of exogenous inputs, Xi is the exogenous input i at time t and ηi,r(z−1), i=1, . . . , nx are polynomials in the lag operator z−1 of degree ri, defined as:
ηi,r
Alternatively, in Eqs 2-3 and 8-9, the difference operators ∇ and ∇s can be applied to the signal G(t), instead of w(t). Thus, a SARIMA model can also be represented as
∇sD∇dG(t)=α+w(t) Eq. 2′
ϕp(z−1)P(z−s)w(t)=θq(z−1)θQ(z−s)ε(t) Eq. 3′
And a SARIMAX model as
Preferably, the identification of an optimal seasonal model for each time series is carried out by means of the Box-Jenkins methodology which indicates the steps for identifying, estimating, and checking time series models: checking for stationarity or non-stationarity and differencing the data, if necessary; identifying and selecting an appropriate model structure; estimating the parameters of the chosen model; diagnostic checking of the chosen model; and forecasting, and re-identifying a new model if necessary.
Being obtained the set of SARIMA (or SARIMAX) glucose models, an integration step is performed for obtaining a real-time BG prediction for a desired PH, by a time-varying weighting of the set of glucose predictions given by the seasonal local models, defined by the equations:
where tp is the time at which the prediction is computed, t is the time of the predicted value, in the time interval [tp,tp+PH], Ĝi(t|tp) is the predicted glucose by local model i, for i={1, 2, . . . , c}, γi(tp) is a fuzzy membership corresponding to the weight of model i at time tp, and Ĝ(t|tp) is the final predicted glucose value for future time t, computed at current time tp.
Time-varying weights γi(tp), i=1, . . . , c, are computed from the membership value of CGM measurements to each cluster i, with respect to a time window [t0,t1], denoted as ui(t1; t0), and defined by:
where [G]t
In one embodiment, the time window [t0,t1] is defined as the time interval from the timestamp of the last event (tek) up to current time (tp), so that
γi(tp)=ui(tp;tp−tek) Eq. 13
In one embodiment, time-varying weights γi(tp), i=1, . . . , c, are computed in a two-step procedure, In the first step, clusters with a high enough contribution in a first time window [tp−tw1,tp] are selected, as defined by the index set
I={i|u
i(tp;tp−tw1)≥μmin(tp)} Eq. 14
Preferably tw1=tek is chosen. In a second step, selected clusters contributions are computed, with respect a second shorter time window [tp=tw2,tp], so that tw2<tw1, as defined by:
Finally, the BG prediction Ĝ(t|tp), t in [tp,tp+PH], obtained is saved.
The integration step for obtaining a real-time BG prediction is preferably carried out by performing the following phases:
The method of the invention could also comprise a step of calculating a crispness index, which gives a measure of how much the glucose prediction is produced by a single model. The glucose model is a multi-model system, but is more reliable if one of the available local models match perfectly. Therefore, fuzziness is included to give robustness by the integration of the local models when one of them is not enough to provide the best estimation.
For obtaining the crispness index at time tp, the Manhattan distance, normalized to the interval [0,1], between the vector composed by the membership values to the c clusters and the equally-distributed-membership vector (1/c, 1/c, . . . , 1/c) representing the lowest crispness (maximum fuzziness) is used, defined as:
The method of the invention could also comprise a step of calculating a normality index, which gives a measure of how much the measured CGM data in the current event period is similar to historical data represented by the clusters.
The normality index at time tp is calculated by:
Therefore, the proposed system allows diabetic patients to detect abnormal states which can guide therapeutic decisions.
The invention refers also to a system for controlling blood glucose automatically, which comprises one CGM sensor; and a control unit, connected to the CGM sensor.
The control unit of the system is adapted to perform the steps of the method of the invention explained.
The system of the invention could further comprise a pump for delivering insulin, which is connected to the control unit. The control unit in this case would be configured to calculate the amount of insulin to provide according to the calculated BG prediction.
Also the system could comprise a communication module connected to the control unit, to connect to an external device. In this case, the control unit would be configured to calculate the cripness index following the steps explained for the method of the invention. Having obtained the cripness index, the communication module sends said cripness index to the external device.
In the same way, the system could also comprise an alert module connected to the control unit, to activate an alarm when an abnormal glucose behavior is detected. In this case, the control unit is configured to calculate a normality index following the steps explained for the method of the invention. Then, the alert module activates the alarm when the calculated normality index is abnormal.
To complement the description being made and in order to aid towards a better understanding of the characteristics of the invention, in accordance with a preferred example of practical embodiment thereof, a set of drawings is attached as an integral part of said description wherein, with illustrative and non-limiting character, the following has been represented:
The method of the invention preprocesses original continuous glucose monitoring (CGM) historical data under free-living conditions (normal life) to obtain sets of similar glycemic profiles (clusters) useful for seasonal model identification of different event periods.
First of all, original CGM historical data of different full days, including timestamps of meals, hypoglycemia treatment, and other events of interest, will be preprocessed. The long-term data is partitioned into a set of “event-to-event” time subseries, called event periods, driven by said events timestamps, to enforce seasonality, being original periods in different lengths.
Seasonality is enforced to apply the seasonal stochastic modeling techniques. The initial instant, duration, and final instant of each event could be variable, since original time series data is partitioned into a set of “event-to-event” time subseries, driven by event timestamps. Therefore, each time subseries initial point is the time instant when the event starts and its final point is the initial point of the next event. A special case is the after dinner where a fixed-length (for example, 6 hours) postprandial period is considered, then, the night period starts, finalizing at breakfast time. The subset of event periods can be divided into different subsets using event labels when available (for instance, subset 1 collecting breakfast, lunch and dinner event periods, subset 2 collecting night event periods, subset 3 collecting hypoglycemia treatments event periods, etc.). When available, exogenous inputs time series are also split in said subsets. In this case, each subset will be clustered separately.
Hence, each day provides at least four subseries with different lengths (three main meals and night period), and these lengths change every day. In order to force seasonality, the period with the maximum duration is detected and its length is stored as “s”. Then, all the subseries duration is fictitiously expanded to “s”, by adding “not a number -NaN-” values after the last available value, giving rise to regularized-length incomplete (missing data) time series data. In this way, all the periods will have the same length (s) but most of them will have some NaNs in the final positions of the time subseries.
Preprocessed data are then clustered and, once the c (number of clusters) sets of similar enough event periods with the same (enforced) length, resulting from the said partitioning of the original CGM time series, are available as training sets, each cluster is modeled as a SARIMA (or SARIMAX) “local model” with seasonality period s to simplify the glucose dynamic prediction, leading to a set of local models for glucose prediction Ĝi(t) for i={1, 2, . . . , c}. Finally, the BG prediction is obtained by integrating the local models predictions.
For clustering the preprocessed data, a clustering algorithm, for example fuzzy C-Means (FCM) algorithm, is used. Nevertheless, FCM algorithm cannot be directly applied to incomplete data (missing data), since all data is required to compute the cluster prototypes (centroids) and the distance measures.
Therefore, a Partial Distance Strategy FCM (PDSFCM) algorithm is used to clustering event periods incomplete data. PDSFCM partitions data into c>1 clusters, wherein the optimal number of clusters could be determined automatically by using an index taking into account clusters cohesion and separation, such are Xie-Beni index (XBI) or Fukuyama-Sugeno index (FSI) index, by minimizing the following objective function:
In Eq. 1, X={x1, x2, . . . , xn} denotes regularized-length event periods extracted from raw data time series, V=(v1, v2, . . . , vc) is a vector of unknown cluster prototypes vi∈p, d is a distance function representing the partial distance between the event periods (incomplete time series) and cluster prototypes and m ∈[1,∞) is a fuzziness parameter, U=[uij] with i=1, 2, . . . , c, and, j=1, 2, . . . , n is a matrix of memberships wherein:
Then, data in each cluster is modeled as a SARIMA (or SARIMAX) “local model”, which is an expanded form of its non-seasonal counterpart ARIMA model that includes as new model components seasonal autoregressive (SAR) and seasonal moving average (SMA) terms. SAR and SMA terms are added so that an observation at time t depends on: previous observations and previous stochastic errors. In both cases, the previous values are taken at times with lags that are multiples of the seasonality period s, as detailed in equations (2)-(7). This means that predicted values will not only depend on recent past values, but also on behaviors observed in recent events in the same cluster.
First, for each cluster a time series is obtained by concatenating, time ordered, the event periods in said cluster, or a subset of them with a membership value to said cluster above a given threshold μmin>0.
Given a concatenated CGM time series {G(t)|t=1, 2, . . . , k} resulting from the clustering of regularized-length event periods, a SARIMA model is expressed as:
G(t)=α+w(t) Eq. 2
ϕp(z−1)P(z−s)∇sD∇dw(t)=θq(z−1)θQ(z−s)ε(t) Eq. 3
In Eq. 2, G(t) is the glucose concentration at time t, α is a constant term, called intercept, ∇ is the backward difference operator, defined as:
∇w(t)=w(t)−w(t−1)
Also, d is the non-seasonal integration order of the time series, D is the seasonal integration order of the time series, ∇s is the seasonal backward difference operator, defined as:
∇sw(t)=w(t)−w(t−s)
Also, ε(t) is an stochastic error following a white noise process defined by:
ε(t)˜WN(0, σ2)
And ϕp(z−1), P(z−s), θq(z−1) and θQ(z−s) are polynomials in the lag (back-shift) operator z−1 of degree p, q, P, and Q, respectively, defined as:
ϕp(z−1)=1−ϕ1z−1−ϕ2z−2− . . . −ϕpz−p Eq. 4
P(z−s)=1−sz−s−
2sz−2s− . . . −
Psz−Ps Eq. 5
θp(z−1)=1+θ1z−1+θ2z−2+ . . . +θqz−q Eq. 6
θQ(z−s)=1+θsz−s+θ2sz−2s+ . . . +θQsz−Qs Eq. 7
Each model can be expressed in a short form as SARIMA (p, d, q)(P,D, Q)s. The SAR, and SMA coefficients can be estimated by fitting SAR and SMA models to the residual errors ε(t) themselves.
Clustering event periods, previously enforcing seasonality to a period “s” equal to the length of the original longest period, would provide similar enough event periods as training and validation sets, for determining a SARIMA model for each cluster i, preferably by using a Box-Jenkins methodology, leading to a set of local glucose predictions Ĝi(t|tp) for i={1, 2, . . . , c}, for a desired prediction horizon (PH) at time instant (tp).
Then, a local models integration is performed by a fuzzy approach, for obtaining a real-time BG prediction Ĝ(t|tp)) for a desired prediction horizon (PH) at time instant (tp), wherein the available (CGM previous to tp) data belongs partially (from 0 to 1) to all clusters and the BG prediction is computed by weighting the c estimations according to the fuzzy membership (γi(tp)) at time tp to each cluster i. The integration step is defined by the equations:
In Eq. 8,γi(tp) is the fuzzy membership at time tp, i.e. the weight of each SARIMA model in predicting the Ĝi(t|tp) for i={1, 2, . . . , c} is a set of glucose predictions given by the local models for each cluster. Fuzzy membership γi(tp) is computed in two steps. In the first step, fuzzy membership of the CGM in the time window [tp−tw1,tp] is computed in Eq. 9. A window from the start of current event period up to current time tp is considered in this preferred embodiment. Then the set of clusters with a membership higher than the threshold μmin(tp) is selected in Eq. 10. Then, final weights are obtained in Eq. 11 computing fuzzy membership of CGM to the selected clusters in the shorter time window [tp−tw2,tp], Finally, the BG prediction Ĝ(t|tp) obtained is saved. Optionally, only the first step could be applied, computing the fuzzy membership from a single time window [tp−tw1,tp].
After an event period is finalized and next event period is started, the whole period data is then appended as new data in the cluster, having the highest membership, as well as historical data in the corresponding local model. It is not expected that the profile of the modified CGM data cluster changes too much, since the aspect of the new time series is expected to be similar to the others in the cluster.
It is fundamental to have new series in the cluster since SARIMA models use previous event periods data of the same subset/cluster, for their predictions.
Additionally, it would be interesting to have all this new series stored in the cluster for an eventual online updating of the SARIMA models. The method of the invention could also comprise a step of calculating a “crispness index”, giving information about how much the glucose prediction is produced by a single model, which is provided along with the available glucose predictions and glycemic status estimations at each time instant
Provided that the glucose model is a multi-model, the “crispness index” gives information about how much the model is reliable. Fuzziness is included in the global model to give robustness by the integration of the local models when one of them is not enough to provide the best estimation. Nevertheless, the global model is more reliable if one of the available local models match perfectly the system, i.e. the membership is one for one of the local models and zero for the others.
For quantifying, the normalized distance between the membership values and an equally-distributed-membership vector (1/c, 1 /c, . . . , 1/c) is used. This vector represents the lowest crispness (highest fuzziness), with equal contribution of all local models. Corresponding to a crispness index value of 0. The highest crispness is obtained for a membership of 1 for one of the clusters, and 0 for the others, providing a distance to (1/c, 1 /c, . . . , 1/c) equal to 1/(2(1−1/c)), which should correspond to a crispness index value of 1. The crispness index is thus computed as:
which corresponds to a normalized Manhattan distance between the membership values and the vector (1/c, 1/c, . . . , 1/c).
The method of the invention could further comprise a step of calculating a “normality index”, giving information about how normal is the actual behavior or if it is an abnormal situation.
The “normality index” is designed to determine whether the measured CGM data in the current event period is similar to historical data represented by the clusters, and represents whether the period trying to estimate is normal or not, in the sense that the behavior trying to forecast, and then could be estimated by a single local model or a combination of them, or is not among said past behaviors, being beyond the past behaviors and then extrapolated by the models. Possibility memberships (γiP(tp)) are used in this case, instead of fuzzy memberships. The possibility membership could be computed as described in the equations:
In Eq. 17 to 19:
Also, d is a distance function (Euclidean distance function between CGM data segment from t0 to t1 and the corresponding segment of the cluster prototypes (centers) and m∈[1∞) is the fuzziness parameter.
If the period trying to estimate is abnormal and far away from all the available local models, the distance will be similar to all prototypes then resulting in all the memberships equal when Eq. 12, 14, 15 is applied. The same result is obtained when a period trying to estimate is normal and just in the middle of the local models available. Therefore, Eq. 12 is not useful for detecting abnormal periods.
Using Eq. 17-19 the abnormal period results in very low membership for all the clusters. The sum of all those possibility memberships (Σi=1eγip(tp)) computed online provides a measure of the abnormality of the period: the sum of the possibility memberships closer to zero the more abnormal period.
On detecting abnormal behaviors, an extrapolation is performed in predicting BG, and an alert about an abnormal situation is activated, for hardware problems, missed bolus leading to extreme hyperglycemia, unusual exercise leading to hypoglycemia, or any other behavior beyond the past time series available for local models identification. Also, abnormal behavior can provide alerts to the user, highlighting the necessity of new models to be learned due to new user behaviors.
The calculation of the “crispness index” and the “normality index” could be performed by using an online monitoring system.
To analyze the performance of the proposed method, a long-term period of four months data from a virtual patient is used for training purposes (clustering and training of local models). The simulated data were generated using the adult patient number 1 of an extended version of the UVA/Padova simulator with variability sources, and neither exercise not missed boluses were considered. Additionally two 2-month data sets were generated for the same patient for independent validation. In the validation set 1, neither exercise not missed boluses were considered similarly to the training data set. In validation set 2, exercise and missed boluses were added.
In the training data and validation set 1, three daily random meals of 40, 90 and 60 g at 7:00, 14:00 and 21:00 hours were considered, and variability sources include: meal-size variability (coefficient of variance 10%), meal-time variability (standard is deviation 20 min), uncertainty in carbohydrate (CHO) estimation (uniform distribution between −30% and +10%), meal absorption rate (kabs±30%), CHO bioavailability (f±10%), insulin absorption model parameters (kd, ka1, ka2 ±30%), and insulin sensitivity parameters (Vmx, Kp3±30%). In the validation set 2, additionally to all the above, one missed bolus per week was considered, and a weekly pattern of one hour exercise session at 18:00 at 50% intensity on Tuesday, Thursday and Saturday. A variability on the exercise start time (standard deviation 20 min) and duration (coefficient of variance 10%) was considered.
Training data, validation data 1 and validation data 2, included timestamps for breakfast, lunch, dinner and hypo treatments. Labels indicating the event type were also considered.
The method of the invention, in this particular example, starts with the preprocessing of the long-term time series. First, timestamps for night periods are generated from dinner and next-day breakfast reported event, and night period events are labelled accordingly. Then, three subsets of event periods are considered:
Each subset will be considered independently, applying the method of the invention to each of them, that is, 3 clustering and local model indentification problems are solved. As a result of this pre-processing step, the following event-to-event periods are obtained:
The maximum duration for each subset will determine the seasonality considered for that subset when identifying the local models.
Then, PDSFCM clustering, in combination with the FSI cluster validity index for number of clusters determination, is performed for each subset, after length regularization based on the maximum event-to-event period length for each subset. This resulted in 5 clusters for subset 1, 3 clusters for subset 2 and 2 clusters for subset 3. Once all periods are classified, seasonal time series per cluster per subset are created by concatenating those periods assigned to each cluster.
Additionally, presampling data, i.e. data before the period starts from the CGM time series, is inserted for each concatenated period for an adequate initialization of AR and MA model components. This historical data is needed by the models in a length depending on the models orders. A length of five pre-sampling data is considered enough, in this case.
The four months available training simulated data for the adult patient number 1 of the UVA/Padova simulator with three meals per day and multiple variability sources will be divided into two sets. For data in each cluster, the first 80% of the data is used as a training set for building suitable models, and 20% of the data is used for testing the prediction accuracy for these models.
Model training sets thus consisted of concatenated event periods from the first 80% of elements in each cluster, with enforced seasonalities of 111 for subset 1 (106 samples plus 5 pre-samples), 70 for subset 2 (65+5), and 81 for subset 3 (76+5). The appropriate SARIMA model, including structure and parameters, for each cluster is identified through Box-Jenkins methodology, using the rest 20% of data in each cluster for local model validation.
The metrics used for model forecasting accuracy are root mean squared error (RMSE[mg/dL]) and mean absolute percentage error (MAPE[%]), defined as:
In Eq. 16 and 17, n is the number of observations, G(i) denote the ith observation, e(i)=G(i)−Ĝ(i) is the forecasting error, and {tilde over (G)}(i) is a forecast of G(i).
Table I shows the appropriate SARIMA model for each cluster with the average prediction accuracy results for different PH ranging from 15 min to 4 hours, expressed as MAPE (RMSE) computed with Eq. 16-17 as follows: for each event period in the local model validation data (20% of data in a cluster), the average MAPE (RMSE) of the predicted glucose trajectory as time moves along the event period was computed. Then, the average for all event periods in the local model validation data was computed.
In the most challenging scenario of a 4-h PH forecasting period, for meal events (subset 1) a MAPE of 8.39% (RMSE of 10.67 mg/dL) is obtained for cluster 1, 5.09% (10.17 mg/dL) for cluster 2, 5.86% (13.03 mg/dL) for cluster 3,5.92% (14.71 mg/dL) for cluster 4, and 6.16% (10.66 mg/dL) for cluster 5. For night periods (subset 2), 4-h PH forescasting error is 5.18% (5.89 mg/dL) for cluster 1, 6.89% (9.87 mg/dL) for cluster 2, and 4.50% (5.66 mg/dL) for cluster 3. For hypoglycemia treatment periods (subset 3), only cluster 1 data included long enough periods to compute a 4-h PH forecasting, being cluster 2 data shorter than 3 hours. For a 2-h PH, forecasting errors are 5.50% (8.86 mg/dL) for cluster 1 and 3.08% (3.85 mg/dL) for cluster 2. Results for different PHs (15, 30, 60, 120, 180, and 240 minutes) are presented in Table I.
Table II shows the average MAPE (%) and RMSE (mg/dL) values of the global seasonal model (GSM) glucose predictions (after the integration of local models with Eqs. 8 to 11) for the independent 2-month validation data set 1 (same scenario are training data). A value of tw1 equal to the time elapsed from the start of the current event was considered. A value of tw2 equal to 20 minutes was considered. Finally, the threshold μmin(tp) was set to 20% of the maximum membership ui(tp;tp−tw1) for i=1, . . . , c (with c=5 for subset 1, c=3 for subset 2, and c=2 for subset 3).
The results reported in Table II allow us to conclude that the GSM, when challenged with data similar to the one represented by the local models, exhibits high prediction accuracy for larger PHs (up to 240-min-ahead prediction), therefore, GSM could be helpful to allow the diabetic patients and glucose management systems anticipate therapeutic decisions.
Table III shows the average MAPE (%) and RMSE (mg/dL) values of global glucose predictions for the independent 2-month validation set 2, where missed insulin boluses at mealtime happened once per week and exercise 3 times per week, which are events not present in the training data. It is thus expected a higher error, as observed in Table III, which amounts to about 4% increase for long-term PH. However, forecasting errors are still considered globally small, with 12.99% MAPE for a 120-min-ahead prediction, 16.95% for 180-min PH, and 18.39% for 240-min PH.
The integration process is updated online for each validation event period in order to get the online BG prediction by following the steps of:
In addition to the glucose predictions value, a crispness index and a normality index were calculated at each time instant. The crispness index, using for its calculation the changing with time fuzzy memberships, is high (close to one) when a single model represents the behavior for some time, and very low (close to zero) when interpolation among all clusters is needed with equal weight. In a normal period, a high crispness index will be an indication of trust in the prediction. The normality index, using for its calculation the changing with time possibilistic memberships, is high (close to one) when a lot of the available models can represent the behavior and very low (close to zero) when none of them represent the behavior and, therefore, an abnormal situation is presented. A threshold can be set on the normality index to trigger a warning of abnormal response observed (0.2 or lower in this example).
Two cases can be devised in the plots with respect to crispness index:
Regarding normality index, as shown in the
Abnormal states could be detected through the normality index. In the case of abnormal behavior time series, the MAPE and RMSE will be high, since this kind of abnormal behavior was not included for model identification (in the training data). This was illustrated in Table III, with validation dataset 2 including missed boluses and exercise, not present in data used for models training.
Thus, the abnormal state (response to an unexpected missed bolus) can be detected through the normality index, where the lower values below 0.2 indicate an abnormal state. This means that no local model can represent the behavior and therefore, an alarm for the user can be launched to take corrective actions before upcoming extreme hyperglycemia is reached. An artificial pancreas and decision support system can also take automatic actions when this alarm is received.
To illustrate the detection of abnormal states,
Additionally a period around sample 6300 is also detected as abnormal, corresponding to an unusual sustained glucose close to hypoglycemia in a late postprandial response.
As demonstrated, the crispness index is useful for measuring goodness of a single cluster to represent the recent CGM data giving then confidence in the estimation, especially when analyzed in combination with the normality index. The normality index is useful for detecting the abnormal behavior/states allowing for the triggering of corrective actions if needed to mitigate risks for the patient.