The present disclosure relates to methods for monitoring a person using a wearable device having at least two inertial sensors, and to a related method of continuous monitoring the respiratory rate of the person using the information about the activity in progress.
Respiratory rate is a fundamental factor of prognosis that provides important information on the health of a person. Many pathological conditions of the heart and lungs, in particular pneumonia and cardiac arrest, affect respiratory rate and could be predicted with prolonged monitoring of the respiratory rate.
Despite the relevance of respiratory rate as prognostic factor is widely demonstrated in literature, the current gold standard for measuring the respiratory rate is the number of breaths performed in one minute, identified through auscultation or observation, which is not suitable for prolonged monitoring outside the clinical environment. An alternative to this is the employment of dedicated devices, but a limitation that is found in several studies is that in such studies physiological parameters are usually detected with spot measurements and when the subject is at rest, while it is known that physical activity has an influence on cardiorespiratory function.
The patent application WO2019/012384 discloses a wearable device for continuous monitoring of the respiratory rate of a person, comprising three inertial sensors applied on parts of the body of the person as shown in
Even if this prior method provides good estimations of the respiratory rate of a person, there is still the need of improving the accuracy of estimations, in particular while the person is performing dynamic activities.
Studies carried out by the inventors have shown that it is possible to improve further the prior method disclosed in the PCT patent application WO2019/012384. In particular, it has been found that it is possible to exploit the sensor system of the above PCT patent application using two inertial sensors only, installed on a person's body, for identifying a person's activity in progress with the method as defined in claim 1. This outstanding result is substantially obtained through the steps of:
Preferred embodiments are defined in the annexed claims.
In the ensuing description, the methods of the invention will be illustrated referring mainly to the particular case in which three inertial sensors (hereinafter referred to also as “sensor units”) are installed on the person's body, as shown in
Respiratory parameters are known to change during different activities and in different postures, having a system that combines respiratory parameters and human activity recognition (HAR) would provide even more clinically relevant information.
In the prior sensor system shown in
From the post-processing side, the algorithm to extract breathing parameters allowed to obtained results that confirm what is known in the literature [A. Angelucci, D. Kuller, and A. Aliverti, “A Home Telemedicine System for Continuous Respiratory Monitoring.,” IEEE J. Biomed. Heal. informatics, vol. 25, no. 4, pp. 1247-1256 April 2021, doi: 10.1109/JBHI.2020.3012621] [W. Qi and A. Aliverti, “A Multimodal Wearable System for Continuous and Real-time Breathing Pattern Monitoring During Daily Activity.,” IEEE J. Biomed. Heal. Informatics, December 2019, doi: 10.1109/JBHI.2019.2963048], i.e. that the frequency is higher in dynamic conditions.
Furthermore, it has to be noted that most systems are not able to provide measurements of the respiratory rate during demanding dynamic activities like running. This system is advantageous both for sports and medical applications, due to its ability to measure this parameter in a broad range of situations.
However, the signal-to-noise ratio is too high while climbing the stairs and the algorithm could not be applied.
The results obtained showed an overall good capability to recognize different activities, independently from the age or the gender of the subjects. Although only features in time and not in frequency were used in the case with three units, the comparison between the use of a single units compared to the use of three, showed that the second works better, with higher accuracy and f1-score both for machine and deep learning methods. A final consideration regards the first step of the data preparation: removing the initial transitory might have led to an overestimation of the accuracy.
The present disclosure presents an advanced prototype suitable for continuous monitoring and shows how to exploit the possibility to perform human activity recognition (HAR) from the raw sensor data.
The 20 healthy subjects involved in the research (9 men, 11 women) were between 23 and 54 years old at the time of the study, with mean age 26.8, mean height 172.5 cm and mean weight 66.9 kg. The experimentation was approved by the Ethical Committee of Politecnico di Milano (Protocol number: 20/2020) and all participants signed an informed consent.
The protocol included seven static postures (sitting with support, sitting without support, supine, prone, left decubitus, right decubitus, standing) and five dynamic activities (walking slow at 4 km/h, walking fast at 6 km/h, running, climbing up and down the stairs, cycling). The walking and running activities were performed on a treadmill, while the cycling activity was performed on an ergometer.
As shown in
The sensor system used for the test is composed of three units, one located on the thorax, one on the abdomen and one on the lower back, used as a reference.
Data coming from the three units may be collected either by means of an ANT USB2 Stick that is plugged into a personal computer during the acquisitions or by an Android smartphone that supports ANT.
The sensor data are composed of three accelerometer components, three gyroscopes components and three magnetometer components and they are sent to the microcontroller with a 40 Hz rate. The microcontroller, then, computes the 9-axis quaternion, transmitting one quaternion out of four to the USB2 stick through a radio-frequency antenna integrated in each unit, resulting in a 10 Hz frequency, with the ANT communication protocol. The USB2 Stick is the receiver of the data sent by the microcontroller and it is configured as the master, while the three units work as the slaves and they are the transmitters of the sensor's detections. The topology of the network is called Shared Channel and is shown in
The nine components provided by the IMU sensor are calibrated and then used by the microcontroller to calculate a quaternion, which is a four-dimensional complex number [q1 q2 q3 q4]. The fusion of the data collected from the sensor is done by using the sensor fusion iterative algorithm developed by Madgwick et al. [S. O. H. Madgwick, A. J. L. Harrison, and R. Vaidyanathan, “Estimation of IMU and MARG orientation using a gradient descent algorithm,” in 2011 IEEE international conference on rehabilitation robotics, 2011, pp. 1-7.], which is able to compute the quaternions representing the orientation changes of each unit relative to the earth frame. Each component is expressed through a floating-point value, ranging from −1 to 1 and it is transmitted in a byte of the data payload. Moreover, it is also present a counter, increased every four quaternions calculated with a frequency of 40 Hz, in order to identify the n-th transmission.
The process which leads to the respiratory parameters extraction from the data collected by the units is performed offline using a software that implements the previously validated algorithm by Cesareo et al. [A. Cesareo, Y. Previtali, E. Biffi, and A. Aliverti, “Assessment of breathing parameters using an inertial measurement unit (IMU)-based system,” Sensors (Switzerland), vol. 19, no. 1, pp. 1-24, 2019, doi: 10.3390/s19010088.] using the Python programming language, with small variations to adapt the algorithm to the processing of dynamic activities. In particular, the variations are in the cut-off frequencies of some of the implemented filters. The whole elaboration algorithm can be subdivided into four main parts: pre-processing, dimension reduction, spectrum analysis and processing.
In the pre-processing phase, the data divided by unit of origin are organized in four arrays, where missing data are replaced after interpolation is performed. The quaternions are created by combining the arrays. The same selection was done in to train the Human Activity Recognition algorithm that is presented in the next section.
Afterwards, the quaternion product is computed, providing the orientations of the thoracic and abdominal unit referred to the orientation of the reference unit. (1) and (2) show how these computations are performed:
As a consequence, the non-respiratory movements are reduced because angular changes are referred to the reference unit, which does not detect breathing-related motions, but it is integral with trunk movement. Then, the baseline is computed by means of the moving average on 97 samples for each quaternion component and subtract to them order to remove the residual non-breathing movement. The generated components are the input for the dimension reduction block.
With the aim of reducing the dimension of the dataset, the Principal Component Analysis (PCA) [S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemom. Intell. Lab. Syst., vol. 2, no. 1-3, pp. 37-52, 1987], [M. Ringnér, “What is principal component analysis?,” Nat. Biotechnol., vol. 26, no. 3, pp. 303-304, 2008] is performed. The first component, the one with the greatest amount of variance explained, is computed for the thorax and for the abdomen and considered as respiratory signal and constitute the basis for the spectrum analysis.
The generated signals are filtered with a Savitzky-Golay FIR (Finite Impulse Response) smoothing filter of the 3rd with a window length of 31 samples. This filter works using the linear least squares method in order to fit successive sub-sets of adjacent data with a third order polynomial. In this way, the noise is decreased without changing the shape and the signal peaks height. Then, the mean (fmean) and the standard deviation (fstd) of the inverse of the distances between subsequent peaks are considered to obtain a frequency estimate. fmean and fstd are used to compute fthresh and the procedure is performed both for the thoracic component and the abdominal component as in (3):
According to an aspect, fthresh_min may be different for static postures and dynamic activities. As schematically shown in
It is worth remarking that, according to one aspect, the activity in progress is determined preliminarily without calculating a respiratory rate of the person. This determination is obtained by means of a pre-trained activity recognition algorithm receiving as entries at least the reference quaternion of the reference unit. According to an option, the activity in progress is determined with the pre-trained activity recognition algorithm receiving as entries the reference quaternion and also the first quaternion generated by the thoracic or by the abdominal unit.
As illustrated in
According to an aspect illustrated in
Once the threshold frequencies of both the thoracic and the abdominal unit are obtained, the low-frequency threshold is computed as the minimum between the abdominal low threshold and the thoracic low threshold. The use of a low threshold helps in the identification of the power spectral density (PSD) peak related to the respiratory rate and does not consider very low frequency peaks, often caused by movement artifacts. Subsequently, the PSD estimate is computed employing the Welch's method, with the Hamming window type, 300 samples as window size, and 50 samples of overlap.
The PSD maximum in the interval between the computed low threshold and a maximum (for example 1 Hz for static postures, 0.75 Hz for walking and cycling, 1.4 Hz for running) is identified (fpeak), which is used to build the adaptive band-pass filter settings (centered in fpeak). In particular the upper and lower cut-off frequencies, for both thorax and abdomen, are obtained as in (4) and (5):
The final processing block comprises all the processes intended to extract breathing frequency and other respiratory parameters from the signals obtained after the dimension reduction block.
The first step is the application of the band-pass filter with the previously set cut-off frequencies fU and fL. Since the frequencies are dependent on the fpeak the result is an adaptive filter, based on the specific analysed recording.
Then, a parametric tuning based on the fpeak value is performed. This is necessary for the subsequent steps of filtering and maxima and minima detection. In particular, the involved parameters are the window length in terms of samples for the third order Savitzky-Golay filter and the minimum peak distance. In fact, the algorithm chooses the tallest peak in the signal and ignores all peaks within the decided distance and the minimum prominence threshold, through which is possible to set a measure of relative importance; a more detailed description of the parameters can be found in the work by Cesareo et al. [A. Cesareo, Y. Previtali, E. Biffi, and A. Aliverti, “Assessment of breathing parameters using an inertial measurement unit (IMU)-based system,” Sensors (Switzerland), vol. 19, no. 1, pp. 1-24, 2019, doi: 10.3390/s19010088.]
Afterwards, filtered signals are furtherly smoothed through the application of a third order Savitzky-Golay FIR filter, to optimize subsequent detection of maxima and minima point, which are identified applying the parameters previously set. Moreover, other than the thoracic and the abdominal signal, the process is repeated also for the sum of the two signals, once they were filtered with the Savitzky-Golay filter. The respiratory rate is thus obtained breath-by-breath and the values obtained for each posture or activity are reported in Section 3.
After the window selection, the data processing to train the activity recognition algorithm is different from the one for the respiratory analysis.
The next step involved the creation of a single large dataset containing all the activity properly labelled for all the subjects. In a first phase, it was chosen to run the algorithm on the data of the reference unit, because it can be considered representative of the subject's positions. Secondly, the algorithm was trained on the signals coming from all three units (thoracic, abdomen, and reference). In both cases, a dataset was obtained merging the tasks “sitting without support” and “sitting with support” in a single label called “sitting”, and the tasks “walking at 4 km/h” and “walking at 6 km/h” in a label called “walking”, so the final dataset has 10 labels. This was done to increase the variability of the signal during the training process, so that the algorithm can distinguish a person that sits with a back support from a supine one and between a fast walk and a run.
The resulting dataset was unbalanced, because the labels “sitting” and “walking” had about twice the data of the other labels. The unbalancing was kept in the situation with one unit, while data were balanced in the training with three units. A common balancing procedure was used, which consists in reducing the samples of each label to same number of the activity with the lowest amount of data.
The implemented pre-processing steps were data standardization, label encoding and segmentation. Standardization is performed with (6), so that data are centred on 0 and properly scaled. μ is the mean and σ is the standard deviation.
Label encoding is needed because the scikit-learn library [F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011], which was used in Python for the present research work, only handles real numbers. Integers from 0 to 9 were used to encode the labels in the dataset. The data were then segmented in non-overlapping windows of 200 samples in length, equal to 2 seconds of recording.
After these steps splitting into training and test sets was required. It was chosen to use 80% of the data for the training set and the remainder 20% for the test set. The seed to the random generator was set equal to 42.
In machine learning methods the feature extraction must be performed before the model training. The selected features are both in time domain and frequency domain for one unit, while only the ones in time domain were used for the three units.
The time domain features were extracted from the time series of the signal and are the following: mean, standard deviation, variance, kurtosis, skewness, peak-to-peak distance, median, interquartile range. The frequency domain features were extracted from the Fast Fourier Transform (FFT) of the signal and are the following: mean, standard deviation, skewness, maxima and minima of the FFT, mean and maximum of the power spectral density.
Three machine learning methods were used: a K-Nearest Neighbor classifier (KNN), a Random Forest classifier (RF) and a Support Vector Machine (SVM).
In the case of the KNN classifier, the metric chosen for the computation of the distance was the Euclidean metric (L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883, 2009). The optimal number of neighbors K is around 5, since afterwards the accuracy score decreases.
In the case of the RF classifier (C. Vercellis, Business intelligence: data mining and optimization for decision making. Wiley Online Library, 2009), the splitting rule to create the nodes of the trees that compose the forest is the Gini Criterion. Afterwards, in each node the corresponding attribute is chosen by minimizing the impurity, as it is traditionally done with RF classifiers.
Since the data of this research project could not be separated linearly in the original space, to develop a SVM a kernel was used. In this case, it was decided to use the Radial Basis Function Kernel, which can be expressed mathematically as (7):
Where σ is the variance and the hyperparameter |X1-X2| is the Euclidean distance between two points X1 and X2. In this case, distance is used as an equivalent of dissimilarity: when the distance between the points increases, they are less similar. By default, σ is taken equal to one, so the kernel is represented by a bell graph, that decreases exponentially as the distance increases and is 0 for distances greater than 4.
Five networks were created using deep learning methods; their characteristics are shown in detail in
All networks use the Sequential Model in order to construct a plain stack of layers where each layer has exactly one input tensor and one output tensor. During the optimization part of the algorithm, the error at the current state must be iteratively estimated. For all networks, the chosen loss function was the Sparse Multiclass Cross-Entropy Loss as in (8), used to calculate the model's loss such that the weights can be updated to minimize the loss on subsequent evaluations. The Cross-Entropy loss is defined as:
Where w refers to the model parameters, yi is the true label and ŷi is the predicted label.
After that, to reduce the losses, an optimizer was used to adjust the neural network's attributes such as weights and learning rate. The optimization method for the CNNs and GRU used was the Adam optimizer (D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr. arXiv1412.6980, 2014) based on adaptive estimates of lower-order moments. For the LSTM networks, the RMSprop optimizer was used. The batch size is a hyperparameter that defines the number of samples taken from the training dataset to train the network before updating the internal model parameters; the chosen value was 16.
Respiratory rate was studied for the 20 involved subjects in the different postures and activities. Due to the unfavorable signal-to-noise ratio, parameters could not be extracted in the case of climbing stairs with the previously validated algorithm, therefore those values are not included in the analysis. The dataset presented puts together the two sitting positions, but separates “walking slow” and “walking fast” to show the sensitivity of the respiratory analysis algorithm to the different levels of effort. The boxplots of the median values obtained for each subject in the different conditions are shown in
The distributions were statistically compared one with the other with the One-Way Repeated Measurements ANOVA. The Shapiro-Wilk normality test was failed (p<0.05); also the Equal Variance Test (Brown-Forsythe) was failed (p<0.05). The differences in the mean values among the groups were greater than would be expected by chance; there is a statistically significant difference (p≤0.001).
To isolate the group or groups that differ from the other, the Bonferroni t-test was used as multiple comparison procedure. The p-values obtained with these comparisons were analyzed. The activities, “walking slow”, “walking fast”, “running” and “cycling” have a statistically significant difference with respect to static postures (p<0.05 in all cases), but not always one with respect to the other. “Walking slow” and “walking fast” do not significantly differ from “cyclette” (p=1.000) and between one another (p=1.000). This result confirms that during physical activity the respiratory rate increases and this phenomenon is more evident when the activity is more demanding (during “running”). Also, there is a statistically significant difference between the “supine” and the “prone” positions (p=0.045) and between the “lying right” and the “prone” positions (p=0.049). This is likely due to the fact that the processing algorithm is designed to analyze the movement of the two units in the front with respect to the reference unit, while in prone position also the dorsal movement contributes to ventilation.
All methods with three units have better performances when compared to the case with only one unit. The most performing one is the 1D Convolutional Neural Network (1DCNN) with three units. It has to be considered also that the features extracted for the three units are only concerning time, which suggests that the inclusion of the frequency features could further improve the accuracy.
A block diagram of an algorithm for estimating respiratory parameters of a person is schematically illustrated in
The required steps to calculate the principal components are the following:
For static activities the first principal component of inertial sensors installed on thorax and abdomen (referred to the reference sensor) is taken because respiratory signal is thought to be lying in it. For dynamic activities such as walking or running, it is observed that the first principal components contain information about movement (such as cadence) and, in order to retrieve the respiratory signal, higher order principal components may be considered. It has been noticed that the first principal component contains information about movement and the second principal component contains information about the respiratory signal, but in general the respiratory component during dynamic activities may be found in higher order principal components than the first principal component.
According to this disclosure, it has been tried to apply the algorithm shown in
It was observed that, in static postures, reliable estimations may be obtained by processing the first principal component. By contrast, for dynamic activities such as walking and running, more accurate estimations may be obtained by processing the second principal component, whilst the first principal component contains information about the cadence of the dynamical activity. Moreover, it has been found that it is possible to obtain accurate estimations using only two inertial sensors, namely a single inertial sensor (preferably installed either on the thorax or on the abdomen) and the reference inertial sensor. Preferably, after having determined the type (static/dynamic) of a human activity in progress, the step of estimating respiratory parameters may be carried out with an inertial sensor installed on the thorax and with the reference inertial sensor. According to an alternative, the step of estimating respiratory parameters is carried out with an inertial sensor installed on the abdomen and with the reference inertial sensor.
Therefore, as shown in
Merely as a comparative example, power spectral densities of the first principal components are represented in
As schematically represented in
Number | Date | Country | Kind |
---|---|---|---|
102021000029204 | Nov 2021 | IT | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2022/061089 | 11/17/2022 | WO |