The present invention refers to a computer implemented method for classifying quality of biological sensor data. The invention further relates to a biological sensor and to a computer program and a computer-readable storage medium for performing the method according to the present invention. The method and devices, in particular, may be used in the field of body worn devices such as wrist-worn devices or head-worn devices. For example, the biological sensor may be worn on the wrist or head for example. Other measurement positions, however, are possible such as chest or finger. Other fields of application of the present invention, however, are feasible.
Wearable sensors are broadly used for collecting physiological and behavioral signals, used for health monitoring and even as medical devices, as described in Coravos, A., Khozin, S., and Mandl, K. D., “Developing and adopting safe and effective digital biomarkers to improve patient outcomes”, npj Digital Medicine, 2(14), 2019. Predictions of these health monitoring tools or medical devices are only as reliable as the sensor data used. Sensor data quality may depend on hardware and can be highly prone to noise. Therefore, the signal quality and actual feature estimates have been shown to vary, as described in Sequeira, N. et al., “Common wearable devices demonstrate variable accuracy in measuring heart rate during supraventricular tachycardia, Heart Rhythm, 17(5), 2020 and Pasadyn, S. R., et al., “Accuracy of commercially available heart rate monitors in athletes: A prospective study. Cardiovascular Diagnosis and Therapy”, 9(4):379-385, 2019. In order to be able to make reliable predictions or assumptions regarding the wellbeing of the person carrying the sensor device, it may be necessary to be confident that only using reliable data or signals are used. The signal quality of sensors may be negatively influenced by factors such as motion artifacts, sensor placement, and even blood perfusion or skin type, e.g., photoplethysmograph (PPG), electrocardiogram (ECG), electroencephalogram (EEG), e.g as described in Bent, B., et al., “Investigating sources of inaccuracy in wearable optical heart rate sensors”. npj Digital Medicine, 3(18), 2020.
Thus, there is a need to have a reliable classification of clean vs noisy signals, which currently does not exist as a standard methodology. Instead, for different applications people tend to use sensor-specific heuristics, see e.g. Bhowmik, T., et al., “A novel method for accurate estimation of HRV from smartwatch PPG signals”, in IEEE Engineering in Medicine and Biology Society, pp. 109-112, 2017, or methodologies that report a continuous quality index, which then poses the non-straightforward question “what should be the signal quality threshold to label a signal as clean or noisy” as described in Orphanidou, C., et al., “Signal-quality indices for the electrocardiogram and photoplethysmogram: Derivation
and applications to wireless monitoring”, IEEE Journal of Biomedical and Health Informatics, 19(3):832-838, 2014, Elgendi, M., “Optimal signal quality index for photoplethysmogram signals”, Scientific Reports, 3(4), 2016, and Zanon, M., et al., “A quality metric for heart rate variability from photoplethysmogram sensor data”, in IEEE Engineering in Medicine and Biology Society, pp. 706-709, 2020.
US 2019/133468 A1 describes an apparatus which includes a sensor module, a data processing module, a quality assessment module and an event prediction module. The sensor module provides biosignal data samples and motion data samples. The data processing module processes the biosignal data samples to remove baseline and processes the motion data samples to generate a motion significant measure. The quality assessment module generates a signal quality indicator based on the processed biosignal data sample segments and the corresponding motion significance measure using a first deep learning model. The event prediction module generates an event prediction result based on the processed biosignal data sample segments associated with a desired signal quality indicator using a second deep learning model.
It is therefore desirable to provide methods and devices which address the above-mentioned technical challenges of using wearable sensors for health monitoring and/or as medical devices. Specifically, methods and devices shall be proposed which overcome the need for extensive manual annotations.
This problem is addressed by a computer implemented method for classifying quality of biological sensor data with the features of the independent claims. Advantageous embodiments which might be realized in an isolated fashion or in any arbitrary combinations are listed in the dependent claims as well as throughout the specification.
As used in the following, the terms “have”, “comprise” or “include” or any arbitrary grammatical variations thereof are used in a non-exclusive way. Thus, these terms may both refer to a situation in which, besides the feature introduced by these terms, no further features are present in the entity described in this context and to a situation in which one or more further features are present. As an example, the expressions “A has B”, “A comprises B” and “A includes B” may both refer to a situation in which, besides B, no other element is present in A (i.e. a situation in which A solely and exclusively consists of B) and to a situation in which, besides B, one or more further elements are present in entity A, such as element C, elements C and D or even further elements.
Further, it shall be noted that the terms “at least one”, “one or more” or similar expressions indicating that a feature or element may be present once or more than once typically will be used only once when introducing the respective feature or element. In the following, in most cases, when referring to the respective feature or element, the expressions “at least one” or “one or more” will not be repeated, non-withstanding the fact that the respective feature or element may be present once or more than once.
Further, as used in the following, the terms “preferably”, “more preferably”, “particularly”, “more particularly”, “specifically”, “more specifically” or similar terms are used in conjunction with optional features, without restricting alternative possibilities. Thus, features introduced by these terms are optional features and are not intended to restrict the scope of the claims in any way. The invention may, as the skilled person will recognize, be performed by using alternative features. Similarly, features introduced by “in an embodiment of the invention” or similar expressions are intended to be optional features, without any restriction regarding alternative embodiments of the invention, without any restrictions regarding the scope of the invention and without any restriction regarding the possibility of combining the features introduced in such way with other optional or non-optional features of the invention.
In a first aspect of the invention, a computer implemented method for classifying quality of biological sensor data is disclosed.
The term “computer implemented method” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a method involving at least one computer and/or at least one computer network or a cloud. The computer and/or computer network and/or a cloud may comprise at least one processor which is configured for performing at least one of the method steps of the method according to the present invention. Preferably each of the method steps is performed by the computer and/or computer network and/or a cloud. The method may be performed completely automatically, specifically without user interaction. The term “automatically” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process which is performed completely by means of at least one computer and/or computer network and/or a cloud and/or machine, in particular without manual action and/or interaction with a user.
The method comprises the following steps which, as an example, may be performed in the given order. It shall be noted, however, that a different order is also possible. Further, it is also possible to perform one or more of the method steps once or repeatedly. Further, it is possible to perform two or more of the method steps simultaneously or in a timely overlap-ping fashion. The method may comprise further method steps which are not listed.
The method comprises the following steps:
The term “biological sensor” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to an arbitrary device configured for one or more of detecting, measuring or monitoring at least one biological measurement variable or biological measurement property. Specifically, the biological sensor may be capable of generating at least one signal, such as a measurement signal, which is a qualitative or quantitative indicator of the measurement variable and/or measurement property. The biological sensor may be configured for qualitatively and/or quantitatively determining at least one health condition and/or at least one measurement variable indicative of a health condition of a subject. The term “subject” as used herein refers to an animal, preferably a mammal and, more typically to a human. The biological sensor may be configured for detecting and/or measuring either quantitatively or qualitatively at least one biological and/or physical and/or chemical parameter of the subject and for transforming the detected and/or measured parameter into at least one signal such as for further processing and/or analysis.
The biological sensor may be a portable, in particular handheld and/or wearable, biological sensor. The term “portable” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a property of the biological sensor allowing that a user can one or more of hold and/or wear and/or transport the biological sensor. Specifically, the biological sensor may be wearable. For example, the biological sensor may be a wristwatch such as a smartwatch. Other measurement positions, however, are possible such as or head, chest or finger. Using a portable biological sensor may result in that disturbances can influence the measurement such as motions artefacts. Uncontrolled conditions met in daily life may pose several challenges related to disturbances that can deteriorate the signal making the determination of the health condition untrustworthy and not reliable.
The biological sensor may be or may comprise one or more of at least one photoplethysmogram (PPG) device, at least one electrocardiogram (ECG) device, at least one electroencephalogram (EEG) device. However, other biological sensors are feasible.
For example, the biological sensor may be at least one portable photoplethysmogram device. The biological sensor data may comprise at least one photoplethysmogram obtained by the portable photoplethysmogram device. The term “photoplethysmogram device” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one device configured for determining at least one photoplethysmogram. The term “plethysmogram” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a result of a measurement of volume changes of at least one part of the human body or of organs. The term “photoplethysmogram” (PPG) as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to an optically determined plethysmogram. The PPG may show development of a signal from the PPG device over time.
The photoplethysmogram device may comprise at least one illumination source. The term “illumination source” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one arbitrary device configured for generating at least one light beam. The illumination source may comprise at least one light source such as at least one light-emitting-diode (LED) transmitter. The illumination source may be configured for generating at least one light beam for illuminating e.g. the skin on at least one part of the human body. The illumination source may be configured for generating light in the red, infrared or green spectral region. As used herein, the term “light”, generally, refers to a partition of electromagnetic radiation which is, usually, referred to as “optical spectral range” and which comprises one or more of the visible spectral range, the ultraviolet spectral range and the infrared spectral range. Herein, the term “ultraviolet spectral range”, generally, refers to electromagnetic radiation having a wavelength of 1 nm to 380 nm, preferably of 100 nm to 380 nm. The term “visible spectral range”, generally, refers to a spectral range of 380 nm to 760 nm. The term “infrared spectral range” (IR) generally refers to electromagnetic radiation of 760 nm to 1000 μm, wherein the range of 760 nm to 1.5 μm is usually denominated as “near infrared spectral range” (NIR) while the range from 1.5μ to 15 μm is denoted as “mid infrared spectral range” (MidIR) and the range from 15 μm to 1000 μm as “far infrared spectral range” (FIR).
The photoplethysmogram device may comprise at least one photodetector, in particular at least one photosensitive diode. The term “photodetector” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one light-sensitive device for detecting a light beam, such as for detecting an illumination generated by at least one light beam. The photodetector may be configured for detecting light from transmissive absorption and/or reflection in response to illumination by the light generated by the illumination source.
The PPG device may be configured for measuring blood volume variations due to heartbeat by shining light into the skin and measuring the light that is reflected back. With respect to design of a PPG device reference is made to Biswas, D., et al., “Heart rate estimation from wrist-worn photoplethysmography: A review”, IEEE Sensors Journal, 19(16):6560-6570, 2019. Specifically, the PPG may represent an aggregated expression of many physiological processes within the cardiovascular system as described in Liang, Y., et al.: “An optimal filter for short photoplethysmogram signals”, Scientific Data, 5(180076), 2018. When the PPG signal is reliable, it may be possible to compute heart rate (HR) and heart rate variability (HRV) features, e.g. in order to understand multiple aspects of a person's physical, psychological and mental state, like exercise recovery, see e.g. Bechke, E., et al., “An examination of single day vs. multi-day heart rate variability and its relationship to heart rate recovery following maximal aerobic exercise in females”, Scientific Reports, 10(14760), 2020, cardio conditions, see e.g. Hoshi, R. A., et al., Reduced heart-rate variability and increased risk of hypertension-a prospective study of the elsa-brasil. Journal of Human Hypertension, 2021, sleeping patterns, see e.g. Hictakoste, S., et al., “Longer apneas and hypopneas are associated with greater ultra-short-term hrv in obstructive sleep apnea”, Scientific Reports, 10(21556), 2020, anxiety, see e.g. Rodrigues, J., et al., “Locomotion in virtual environments predicts cardiovascular responsiveness to subsequent stressful challenges”, Nature Communications, 11(5904), 2020, and emotional state, see e.g. Kim, J. J., et al. “Neurophysiological and behavioral markers of compassion”, Scientific Reports, 10(6789), 2020.
The term “biological sensor data” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to data obtained via the biological sensor such as measurement data. The biological sensor data comprises at least one signal, also denoted as sensor signal. The term “signal” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one electrical signal, such as at least one analogue electrical signal and/or at least one digital electrical signal. More specifically, the sensor signal may be or may comprise at least one voltage signal and/or at least one current signal. More specifically, the sensor signal may comprise at least one photocurrent. For example, the signal may be at least one electronic signal of the PPG device, in particular of the photodetector, depending on detected light from transmissive absorption and/or reflection in response to illumination by the light generated by the illumination source.
Further, either raw signals may be used, or processed or preprocessed signals may be used, thereby generating secondary signals, which may also be used as sensor signals. The method may comprise at least one pre-processing step comprising one or more of filtering or normalizing the biological sensor data. For example, in case of a signal of the PPG device a bandpass filter may be used. Additionally, the signal may be normalized so that the values are around 0. However, preprocessing can be different for different signals depending on the physiology.
For example, the signal may be a PPG signal. PPG signals can be easily extracted from human peripheral tissue, such as fingers, toes, earlobes, wrists, and the forehead. Therefore, they may have great potential for application in wearable health devices, as described e.g. in Liang et al. For example, the PPG signals may be collected via a smartwatch, in particular a smartwatch on the wrist equipped with LEDs and photodiode. Generally, arbitrary sampling frequency is possible. High sampling frequency may be preferred. For example, the photoplethysmogram device, e.g. the smartwatch, may be configured for measuring a PPG at 20 Hz sampling frequency. For example, the photoplethysmogram device may be configured measuring a PPG with a frequency from 20 Hz to 1 kHz. The smartwatch may be custom smartwatch, e.g. a Samsung Gear® Sport smartwatch. The PPG signals may be pre-processed using a third order Butterworth bandpass filter with 0.5 and 9 Hz frequency cut on per subject daily PPG signals. The daily PPG signal may be cut into intervals, e.g. 10 second intervals.
The term “providing” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to measuring the biological sensor data and/or retrieving the biological sensor data. The term “retrieving” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process of a system specifically a computer system, of generating data and/or obtaining data from the biological sensor and/or a data storage, e.g. from a network or from a further computer or computer system. The retrieving specifically may take place by at least one computer interface, such as via a port such as a serial or parallel port. The retrieving may comprise several sub-steps, such as the sub-step of obtaining one or more items of primary information and generating secondary information by making use of the primary information, such as by applying one or more algorithms to the primary information, e.g. by using a processor.
The term “quality”, also denoted as signal quality, as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a measure for reliability of a signal determined by the biological sensor. Specifically, the quality may be classified as good for reliable signals and as bad for non-reliable signals. The classifying of quality may comprise discriminating between noisy and clean signals. The quality may be classified dependent on presence of noise and/or artifacts. The reliability of the signal may decrease with increasing noise and/or artifacts. The quality may be negatively influenced by a plurality of factors such as motion artifacts, sensor placement, blood perfusion and/or skin type.
The quality may be used as quality indicator for heart rate variability data. The term “heart rate variability” (HRV) as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a measure of regularity between consecutive heartbeats. The quality may be used for distinguishing between acceptable and non-acceptable heart rate variability data.
The quality of the obtained biological sensor data may be provided to a user, such as the subject, via at least one user-interface. The term “user interface” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term may refer, without limitation, to an element configured for interacting with its environment, such as for the purpose of unidirectionally or bidirectionally exchanging information, such as for exchange of one or more of data or commands. For example, the user interface of the smartwatch may be configured to share information with a user and to receive information by the user. The user interface may be designed to interact visually with a user, such as a display, and/or to interact acoustically with the user. The user interface, as an example, may comprise one or more of: a graphical user interface; a data interface, such as a wireless and/or a wire-bound data interface. Thus, the provided quality may be used for interpreting biological sensor data obtained by the biological sensor. Additionally or alternatively, the biological sensor, such as the smartwatch, may comprise at least one controlling unit configured for dismissing and/or rejecting biological sensor data categorized as noisy or bad quality.
The term “classifying” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process of categorizing the signal into at least two categories, such as noisy or clean signal.
Classifying quality of the signal is performed by using at least one trained trainable model. The term “trainable model” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a mathematical model which is trainable on at least one training dataset using one or more of machine learning, in particular deep learning or other form of artificial intelligence. The term “machine learning” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a method of using artificial intelligence (AI) for automatically model building. The term “deep learning” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a class of machine learning algorithms using multiple layers, in particular using deep learning architectures such as one or more of deep neural networks, deep belief networks, graph neural networks, recurrent neural networks and convolutional neural networks. For example, the trainable model may comprise at least one deep neural network selected from the group consisting of: Convolutional Neural Network (CNN) layers such as in the WaveNet architecture, a recurrent neural network (RNN), a Long short-term memory (LSTM). For example, an architecture inspired by the WaveNet architecture may be used. With respect to WaveNet reference is made to van den Oord, et al., “Wavenet: A generative model for raw audio”, CoRR, abs/1609.03499, 2016. The deep neural network may use stacked causal dilated convolutions. Using a WaveNet-like architecture on PPG data is a novel and unique approach. The skilled person would not use a WaveNet-like architecture because it was originally developed for using it on speech data, and thus, for a very different data type. However, it was surprisingly found that using WaveNet-like architecture on PPG data allows for classifying quality of PPG data with increased reliability.
The training may be performed using at least one machine-learning system. The term “machine-learning system” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a system or unit comprising at least one processing unit such as a processor, microprocessor, or computer system configured for machine learning, in particular for executing a logic in a given algorithm. The machine-learning system may be configured for performing and/or executing at least one machine-learning algorithm, wherein the machine-learning algorithm is configured for building the trained trainable model. The machine-learning system may be part of the biological sensor and/or may be performed by an external processor.
The term “training” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process of building the trained trainable model, in particular determining parameters, in particular weights, of the model. The training may comprise determining and/or updating parameters of the model. The trained trainable model may be at least partially data driven. As used herein, the term “at least partially data-driven model” is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to the fact that the model comprises data-driven model parts and other model parts based on physico-chemical laws. The training may be performed on biological sensor data. The training may comprise retraining a trained trainable model, e.g. after obtaining additional biological sensor data such as during wearing and operating the smartwatch.
The trainable model is trained on historical biological sensor data based on a supervised and/or semi-supervised deep learning architecture. The term “historical biological sensor data” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one independent data set used for training of the deep learning architecture. The historical biological sensor data is independent from the patient, runtime or test data.
The method further may comprise:
The trainable model based on the supervised deep learning architecture may also be denoted as supervised model herein. The trainable model based on the semi-supervised deep learning architecture may also be denoted as semi-supervised model herein.
The term “supervised” deep learning architecture as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a deep learning architecture learning based on labeled historical biological sensor data. In particular, manual labeled historical biological sensor data may be used for training the trainable model based on the supervised deep learning architecture. For example, as historical biological sensor data, a manual labeled PPG dataset may be used. For example, a training dataset of biological sensor data may be set up as follows: Data was collected from 5 healthy volunteers (1 female and 4 male with average age of 33) without any supervision, during their normal daily activities, or during their night sleep. In total 13547 non-overlapping PPG signal samples were collected, of 10 seconds length each. The signals may be manually labeled by experts according to the instructions of Elgendi, M., “Optimal signal quality index for photoplethysmogram signals”, Scientific Reports, 3(4), 2016. 8305 noisy, and 5242 clean PPG signals were categorized. For example, a balanced dataset of 9380 labeled signal samples may be used as training dataset.
The training step may comprise preprocessing the historical biological sensor data, e.g. filtering the PPG signals of 10 seconds each with 20 Hz frequency, with in total 200 data points. The labels may be provided for each input signal for training with “0” indicating a noisy and “1” a clean signal.
The supervised deep learning architecture may comprise at least one input layer receiving the historical biosensor data and/or preprocessed historical biosensor data. For example, as input filtered PPG signals of 10 seconds each with 20 Hz frequency may be used. Thus, the input may comprise a signal comprising 200 values. With different sampling frequencies or lengths in seconds, the values of the PPG signal would vary.
The supervised deep learning architecture may comprise a plurality of convolutional layers, in particular a stack of convolutional layers. For example, the supervised deep learning architecture may comprise five convolutional layers. The convolutional layers may be designed with dilation. The convolutional layers may be configured for dilated convolution. The supervised deep learning architecture may comprise a WaveNet-like neural network architecture. As e.g. described in van den Oord, et al., “Wavenet: A generative model for raw audio”, CoRR, abs/1609.03499, 2016, the main ingredient of WaveNet may be causal convolutions. By using causal convolutions, it may be possible to ensure that the deep learning architecture cannot violate an ordering in which the data is modeled, in particular cannot depend on any of the future time steps. The deep learning architecture having causal convolutions may not have recurrent connections, such that they are typically faster to train than RNNs, especially when applied to very long sequences. One of the problems of causal convolutions, however, may be that they require many layers, or large filters to increase the receptive field. Therefore, WaveNet-like architectures may use dilated convolutions to increase the receptive field by orders of magnitude, without greatly increasing computational cost. The term “dilated convolution” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a convolution where a filter is applied over an area larger than its length by skipping input values with a certain step. It may be equivalent to a convolution with a larger filter derived from the original filter by dilating it with zeros, but may be significantly more efficient. A dilated convolution effectively may allow the network to operate on a coarser scale than with a normal convolution. This may be similar to pooling or strided convolutions, but here the output may have the same size as the input. In particular, the stacked convolutional layers allowing stacked dilated convolutions may enable the network to have very large receptive fields with just a few layers, while preserving the input resolution throughout the network as well as computational efficiency.
The supervised deep learning architecture may comprise causal padding in each convolutional layer.
The supervised deep learning architecture may comprise at least one flatten layer after the convolutional layers and before the outputs. The flatten layer may be designed to transform a matrix output of the convolutional layers into a dense layer. A dense layer may be a neural network structure in which all neurons are connected to all inputs and all outputs.
The supervised deep learning architecture may comprise at least one optimizer, in particular an Adam optimizer. With respect to Adam optimizer reference is made to Diederik P. Kingma, Jimmy Ba, “Adam: A Method for Stochastic Optimization”, 3rd International Conference for Learning Representations, San Diego, 2015.
For example, the supervised deep learning architecture may comprise five convolutional layers. The first layer may have no dilation, the second one a dilation of 2, and from there on dilation may double for each next layer. In each layer 16 filters may be used. A kernel of size 3, 5, 7 or even other sizes may be used. A regularization strength may be in the range of 0.0005 and 0.002, e.g. 0.0005, 0.001, 0.0015 or 0.002. However, other ranges are possible. The supervised deep learning architecture may comprise a flatten layer after the convolutional layers and before the outputs. The supervised deep learning architecture may comprise an Adam optimizer, e.g. with learning rate of 0.00001 and with a decay where the learning rate is halved every ten or 100 or more epochs. However, other learning rates and learning decay rates are possible. For example, for training, a batch size of 128, and epochs up to 300 may be used. For example, for training, a batch size of 128, and epochs from 50 to 500 or even more may be used. However, other batch size and epochs are possible.
The deep learning architecture may comprise as final layer, in particular a dense layer, an output layer comprising two paths. Each of the paths may comprise an output. Specifically, the deep learning architecture may comprise two outputs, a first and a second output. The trainable model is trained by optimizing one loss function in terms of classification or two loss functions in terms of signal reconstruction and classification. The supervised deep learning architecture may be trained by optimizing one loss function in terms of classification. The supervised deep learning architecture may be trained by optimizing two loss functions in terms of signal reconstruction and classification. The semi-supervised deep learning architecture may be trained by optimizing two loss functions in terms of signal reconstruction and classification. The term “loss function” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a function that assigns to each decision, in the form of a point estimate, a range estimate, or a test, the loss that results from a decision deviating from the true parameter. The training of the trainable model may comprise solving an optimization problem, in particular optimizing the loss functions. The term “optimizing a loss function” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process of minimizing the loss function. The trainable model may be trained by optimizing a first loss functions in terms of classification, or the first loss function and a second loss function in terms of signal reconstruction.
The first output may be a class output providing the classified quality. The class output may use the first loss function. The first loss function may relate to classification loss. For example, the class output may take the flatten or dense layer's output as input. The class output may use a sigmoid activation function. The class output may use a binary crossentropy loss function to provide a probability between 0 and 1, with a value over 0.5 indicating that the signal is clean.
The second output may be a mean squared error (MSE) output providing a measure for a difference between the input and reconstructed signal after the convolutions. The MSE output may use a Rectified Linear Unit (ReLU) activation function. The MSE output may use a MSE loss function. The MSE loss function may relate to a difference between a reconstructed input signal and the input signal in terms of mean squared error (MSE). Lower MSE relates to better signal reconstruction. To estimate the MSE output two extra dense layers may be used with a ReLU activation function after the flatten layer to have an output of the same size as the input signal.
As outlined above, the class output may contribute to the algorithm learning with a weight of 1. The second output may be weighted using at least one weight. The weight can be varied. The weight can be from 0 to 1. Empirically, it was found that lower weights can lead to slightly higher accuracy. A range of MSE values can be much larger than 1 (which is the maximum class output). Using a weighted second output may allow to balance between the algorithm to learn about the signal reconstruction and about the class output. This may allow increasing the accuracy for both supervised and semi-supervised architectures. For example, for the supervised architecture accuracy may be 91.6% with equal weights (i.e., 1) vs 92.5% with lower MSE weight such as a weight of 0.1. For the semi-supervised architecture accuracy may be 87.7% with equal weights vs 90.6% with lower MSE weight such as a weight of 0.05.
The method may comprise at least one validation step. The validation step may be performed during training of the trainable model. The validation step may be used for monitoring improvement of the training. The validation step may comprise validating the trainable model using at least one validation dataset. The validation dataset, for example, may comprise 1000 non-overlapping manually labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, wherein the 1000 non-overlapping manual labelled samples used for validation were not used for training.
The method may comprise at least one test step, wherein the test step comprises testing the trained trainable model. The test step may comprise testing the trained trainable model on at least one test dataset. The test step may comprise obtaining performance characteristics of the trained trainable model, e.g. precision, recall, F1-score, area under the curve (AUC).
For example, as test data 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, were used. The 1000 non-overlapping manual labelled samples used for testing were not used for training. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
The classification threshold may denote a quality threshold to label a signal as clean or noisy; ≥0.5 the signal may be classified as clean, <0.5 the signal may be classified as noisy.
For the testing an epoch and regularization strength combination was used that has the highest F1-score, for maintaining a balance between false positives and false negatives. The classification threshold was selected in view that the trained trainable model gives a value between [0, 1], with 0 meaning noisy signal and 1 clean signal. With a classification threshold of 0.5, in case the trained trainable model output value is below 0.5, the signal is regarded as noisy, and clean otherwise. This classification threshold may vary. Techniques for finding the optimal classification threshold are known to the skilled person, e.g. based on ROC curves. For using the semi-supervised model the optimal classification threshold may be used. The method may comprise calculating the optimal classification threshold. Several options for calculating the optimal classification threshold are possible. For example, a sample of the clinical data may be used to estimate the optimal classification threshold and to use that for translating the probabilities (output of the model) into labels rounding on that classification threshold.
Additionally or alternatively, for the testing of the trained trainable model a completely independent labeled dataset may be used. For example, different participants may be used for collecting data for the training set to the ones used for training the model. For example, 1000 non-overlapping samples from the dataset as described in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671 may be used as test dataset. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
The following accuracy was found in case of an optimal classification threshold:
It was found that the supervised learning performs better than using a multivariate quality metric, such as proposed in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671, because more information for the raw signal is used by the method according to the present invention than the feature based multivariate quality metric. The accuracy of the model using a multivariate quality metric from this paper on the same dataset was 84%. Using the second output relating to the reconstruction was found to help with explainability, i.e. to explain what did the model actually learn during the convolutions from the specific input signal. For example, if the reconstruction can show correct peaks and a sinus waveform, then the model has learned what was important to classify a signal as clean. It was expected that the results might be a bit lower because the model becomes more complicated to learn having two competing loss functions. However, it was found that the method performs better than the multivariate quality metric.
For both supervised and semi-supervised deep learning architectures, using two loss functions may allow to jointly learn to use the labeled signals to classify, and thus, to distinguish clean from noisy signals, and to help the network learn more about the physiology of the signal.
The term “semi-supervised” deep learning architecture as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a deep learning architecture learning based on labeled and un-labeled historical biological sensor data. Using unlabeled data in a semi-supervised model may allow improving learning the signal reconstruction and to improve classification performance, especially in cases of new activities/subjects not already included in the original training dataset.
For example, in particular in addition to the labeled PPG dataset described above, as historical biological sensor data for training, an unlabeled PPG dataset may be used. For example, the unlabeled PPG dataset may be set up as follows: Data was collected from 20 healthy volunteers (4 female and 16 male with average age of 32), while performing a series of activities in a supervised manner where the participant would switch activities every 5 minutes. For example, a protocol may be used comprising of multiple activities such as screening and informed consent process (while sitting, at rest), placement of ECG and PPG sensors (while sitting, at rest), baseline (sitting, at rest), paced breathing (ladder of increasing respiratory frequencies from 5 to 20 breaths per minute with steps of 5), 5 minutes of console gameplay (PS4 Aaero), orthostasis (standing, otherwise at rest), mental stress manipulation (Serial 7s [subtraction by 7 from 700, with eyes closed, pronouncing aloud each response]; e.g. as described in Ewing et al 1992), physical activity manipulation (uninterrupted indoor walking along a pre-set circular path; same path for all subjects), baseline (sitting, at rest), retrieve PPG/ECG equipment and debrief. The following table gives a list of an exemplary protocol:
The activities included sitting in a resting position, paced breathing, console gameplay, orthostasis, mental stress manipulation, physical activity, and sitting in a resting position. Some activities are suspected to introduce different level of motion artifacts (e.g., physical activity, orthostasis and console gameplay), while others increase the heart rate and modify the PPG waveform (e.g., paced breathing). For example, in total 37564 non-overlapping PPG signal samples were collected, of 10 seconds length each. For more details of the collection of the unlabeled dataset reference is made to “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671.
As described above, the training dataset for the semi-supervised model may comprise labeled and unlabeled historical biological sensor data. For example, the manually labeled 9380 balanced signal samples may be used and, additionally, collected unlabeled samples, from the dataset collected in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671, may be used.
As described above, the method may comprise the at least one validation step. The validation dataset, for validating trainable model using the semi-supervised deep learning architecture, for example, may comprise 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, wherein the 1000 non-overlapping manual labelled samples used for validation were not used for training.
As described above, the method may comprise the at least one test step. For example, for testing of the trained trainable model being based on the semi-supervised deep learning architecture as test data 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, may be used. The 1000 non-overlapping manual labelled samples used for testing were not used for training. The accuracy of the semi-supervised model when using data from the test dataset from the 5 people is:
Additionally or alternatively, for the testing, other test data may be used. For example, 1000 non-overlapping samples from the unlabeled dataset collected as described above and in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671 may be used as test dataset. The samples used for testing were manually annotated. For example, the dataset used for testing may comprise 796 noisy and 204 clean PPG signals. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
The following accuracy was found in case of an optimal classification:
The performance of the model may be compared to the performance using a multivariate quality metric, such as proposed in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671. The PPG signal was collected simultaneously with ECG signal in order to compare the derived HRV features, and eventually estimate an HRV quality metric per signal, as described in Zanon et al., 2020. The HRV quality metric was computed for each PPG sample signal, and a PPG signal is regarded as trustworthy (i.e., clean) if the HRV quality metric value is below 20. It was found that the semi-supervised learning performs better than using a multivariate quality metric.
The architecture of the semi-supervised deep learning architecture may be identical to the supervised one with the addition of using the unlabeled data in the training step and the extra input parameter zinput. Thus, with respect to description of the semi-supervised deep learning architecture reference is made to the supervised deep learning architecture above. For example, the semi-supervised deep learning architecture may comprise five convolutional layers. The first layer may have no dilation, the second one a dilation of 2, and from there on dilation may double for each next layer. In each layer 16 filters may be used. A kernel of size 3, 5, 7 or even other sizes may be used. A regularization strength may be in the range of 0.0005 and 0.002, e.g. 0.0005, 0.001, 0.0015 or 0.002. However, other ranges are possible. The semi-supervised deep learning architecture may comprise a flatten layer after the convolutional layers and before the outputs. The semi-supervised deep learning architecture may comprise an Adam optimizer, e.g. with learning rate of 0.00001 and with a decay where the learning rate is halved every ten or 100 or more epochs. However, other learning rates and learning decay rates are possible. For example, for training, a batch size of 128, and epochs up to 300 may be used. For example, for training, a batch size of 128, and epochs from 50 up to 500 or even more may be used. However, other batch size and epochs are possible. For each epoch the model may be trained once with the labeled data and once with N randomly picked samples from the joined labeled and unlabeled data, where N is 2 times the size of the labeled set.
Manual labeled and unlabeled historical biological sensor data may be used for training the trainable model based on the semi-supervised deep learning architecture. For unlabeled biological sensor data the trainable model may be trained by optimizing the loss function in terms of signal reconstruction and by disregarding the loss function in terms of classification. For training, the same balanced labeled dataset as with the supervised model may be used, and in addition, the unlabeled data described above. An independent dataset may be used for testing the trained trainable model such as the 1000 non-overlapping random samples as described above.
The present invention specifically proposes a novel Wavenet-like dilated convolutional network for cleaning PPG signal data. Obtaining annotated data is costly and time-consuming; however, large amounts of unlabeled data are available. Using a semi-supervised framework based on signal reconstruction allows for learning a good representation of the signal from unlabeled data. It was found that the different approaches to learning control for false positives and false negatives can be performed in different ways, as described herein, while obtaining high overall accuracy. With tuning (specifically an optimal classification threshold), the semi-supervised model can outperform the supervised approach suggesting such that incorporating the large amounts of available unlabeled data can be advantageous.
The present invention proposes a novel approach of classifying data quality of biological sensors by applying a trained trainable model which allows for having signal reconstruction as well as a semi-supervised deep learning model. Signal reconstruction has not been applied before on signals like PPG because it is a technique usually used on images. Using a semi supervised way as proposed by the present invention may require the signal reconstruction technique to combine the information learned by the unlabeled data and the information/class learned by the supervised data. Such an approach was never mentioned before. Semi-supervised approaches have been used before only for other applications but not for biological signals such as a PPG signal quality estimation and just assume the signal is clean.
The method may comprise introducing an extra input parameter zinput for training based on the unlabeled dataset. This may allow handling the “missing” labels. The extra input parameter may be a binary value indicating if the specific data is labeled or not. For example, the extra input parameter may be “0” for unlabeled data and “1” for labeled data. The extra input parameter may be multiplied with the class output during the learning process such that the class learning may not be affected by unlabeled data.
For example, the training based on the semi-supervised deep learning architecture may comprise training, firstly, with dataset of labeled data to learn the class label and relevant information for signal reconstruction. The training may, subsequently, comprise training only with a random subset of the unlabeled data to better learn the signal reconstruction. The training using the unlabeled data may further comprise introducing signal physiology that was not included in the labeled training set, e.g., different people, different activities. Annotating data is expensive, and very few annotated datasets are available. However, there is a lot of unlabeled data available. Semi-supervised learning may leverage unlabeled data and makes the most efficient use of small amounts of labeled data.
Using the semi-supervised architecture may allow to expand and/or transfer the proposed model to new dataset. For example, if there is a need to adapt the model to a new scenario, even with less data, such as up to less than 50% of the data used for training the originally trained model, e.g. because there are not enough labeled examples, it is possible to train a reliable model using the semi-supervised approach. It was found that accuracy remains >90% with all algorithms.
The trained trainable model may be trained based on a combination of a supervised and semi-supervised deep learning architecture. The combination may use a combined averaged predictions of the two architectures, taking the average of the probabilities reported by the two architectures into account.
In a further aspect of the present invention, a biological sensor is disclosed. The biological sensor is configured for classifying quality of biological sensor data. The biological sensor comprises at least one measuring unit configured for providing biological sensor data comprising at least one signal. The biological sensor comprises at least one processing unit configured for classifying quality of the signal by using at least one trained trainable model. The trainable model is trained on historical biological sensor data based on a supervised and/or semi-supervised deep learning architecture. The trainable model is trained by optimizing one loss function in terms of classification or two loss functions in terms of signal reconstruction and classification.
Specifically, the biological sensor may be configured for performing the method according to the present invention and/or for being used in the method according to the present invention. For definitions of the features of the biological sensor and for optional features of the biological sensor, reference may be made to one or more of the embodiments of the method as disclosed above or as disclosed in further detail below.
The term “processing unit” as generally used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to an arbitrary logic circuitry configured for performing basic operations of a computer or system and/or, generally, to a device which is configured for performing calculations or logic operations. In particular, the processing unit may be configured for processing basic instructions that drive the computer or system. As an example, the processing unit may comprise at least one arithmetic logic unit (ALU), at least one floating-point unit (FPU), such as a math co-processor or a numeric coprocessor, a plurality of registers, specifically registers configured for supplying operands to the ALU and storing results of operations, and a memory, such as an L1 and L2 cache memory. In particular, the processing unit may be a multi-core processor. Specifically, the processing unit may be or may comprise a central processing unit (CPU). Additionally or alternatively, the processing unit may be or may comprise a microprocessor, thus specifically the processing unit's elements may be contained in one single integrated circuitry (IC) chip. Additionally or alternatively, the processing unit may be or may comprise one or more application-specific integrated circuits (ASICs) and/or one or more field-programmable gate arrays (FPGAs) or the like. The processing unit specifically may be configured, such as by software programming, for performing one or more evaluation operations.
The biological sensor may be a portable photoplethysmogram device. The portable photoplethysmogram device may comprises at least one illumination source and at least one photodetector configured for providing at least one photoplethysmogram. The processing unit may be configured for classifying quality of the photoplethysmogram by using the trained trainable model.
Further disclosed and proposed herein is a computer program including computer-executable instructions for performing the method according to the present invention in one or more of the embodiments enclosed herein when the program is executed on a computer or computer network or a cloud. Specifically, the computer program may be stored on a computer-readable data carrier and/or on a computer-readable storage medium.
As used herein, the terms “computer-readable data carrier” and “computer-readable storage medium” specifically may refer to non-transitory data storage means, such as a hardware storage medium having stored thereon computer-executable instructions. The computer-readable data carrier or storage medium specifically may be or may comprise a storage medium such as a random-access memory (RAM) and/or a read-only memory (ROM).
Thus, specifically, one, more than one or even all of method steps a) and b) and optionally c) as indicated above may be performed by using a computer or a computer network or a cloud, preferably by using a computer program.
Further disclosed and proposed herein is a computer program product having program code means, in order to perform the method according to the present invention in one or more of the embodiments enclosed herein when the program is executed on a computer or computer network or a cloud. Specifically, the program code means may be stored on a computer-readable data carrier and/or on a computer-readable storage medium.
Further disclosed and proposed herein is a data carrier having a data structure stored thereon, which, after loading into a computer or computer network or a cloud, such as into a working memory or main memory of the computer or computer network or a cloud, may execute the method according to one or more of the embodiments disclosed herein.
Further disclosed and proposed herein is a computer program product with program code means stored on a machine-readable carrier, in order to perform the method according to one or more of the embodiments disclosed herein, when the program is executed on a computer or computer network or a cloud. As used herein, a computer program product refers to the program as a tradable product. The product may generally exist in an arbitrary format, such as in a paper format, or on a computer-readable data carrier and/or on a computer-readable storage medium. Specifically, the computer program product may be distributed over a data network.
Finally, disclosed and proposed herein is a modulated data signal which contains instructions readable by a computer system or computer network or a cloud, for performing the method according to one or more of the embodiments disclosed herein.
The method and devices according to the present invention may provide a number of advantages over known methods and devices of similar kind. Specifically, the present invention may provide an approach to detecting reliable or clean signals from a continuous PPG signal in a real world dataset during everyday life activities. Generally, assessing the quality of PPG signals may be technically challenging as only small amounts of labeled physiological signals and large amounts of unlabeled data are available. By using the method and devices according to the present invention, specifically by using the trainable model based on the semi-supervised deep learning architectures, it may be possible to leverage the large amount of unlabeled data. Thus, it may be possible to interpret the biological sensor data and to ensure that the trainable model learns information about the physiology of the signal by using signal reconstruction during the learning process of the trained model. Moreover, by using the trainable model, it may be possible to reconstruct the signal and at the same time classify the signal as a noisy or clean signal.
US 2019/133468 A1 describes classifying if a person has atrial fibrillation. Thus, the quality obtained by US 2019/133468 A1 is related to a specific disease but not in general for any PPG signal (e.g. as described in FIG. 5 of US 2019/133468 A1). US 2019/133468 A1 describes complex and time consuming preprocessing steps (e.g. in FIG. 4 of US 2019/133468 A1) like identifying movement using a different non-PPG sensor (i.e., IMU), and also removing baseline signal levels. These complex and time consuming preprocessing steps aim to make the quality detection easier. The present invention, in contrast, avoids such complex and time consuming preprocessing steps but can incorporate such steps into the algorithm implicitly in the model. This can allow avoiding these extra steps before the actual model. The model used in US 2019/133468 A1 requires as additional input the motion information, e.g. from additional sensors like accelerometer, that would make any prediction easier (see FIG. 6 of US 2019/133468 A1). For the model according to the present invention, no additional input the motion information is required.
Referring to the computer-implemented aspects of the invention, one or more of the method steps or even all of the method steps of the method according to one or more of the embodiments disclosed herein may be performed by using a computer or computer network or a cloud. Thus, generally, any of the method steps including provision and/or manipulation of data may be performed by using a computer or computer network or a cloud. Generally, these method steps may include any of the method steps, typically except for method steps requiring manual work, such as providing the samples and/or certain aspects of performing the actual measurements.
Specifically, further disclosed herein are:
Summarizing and without excluding further possible embodiments, the following embodiments may be envisaged:
Further optional features and embodiments will be disclosed in more detail in the subsequent description of embodiments, preferably in conjunction with the dependent claims. Therein, the respective optional features may be realized in an isolated fashion as well as in any arbitrary feasible combination, as the skilled person will realize. The scope of the invention is not restricted by the preferred embodiments. The embodiments are schematically depicted in the Figures. Therein, identical reference numbers in these Figures refer to identical or functionally comparable elements.
In the Figures:
The biological sensor 112 may be a portable photoplethysmogram device 120. The portable photoplethysmogram device 120 may comprises at least one illumination source 122 and at least one photodetector 124 configured for providing at least one photoplethysmogram 126. The processing unit 118 may be configured for classifying quality of the photoplethysmogram 126 by using the trained trainable model 119.
The biological sensor 112 may specifically be configured for performing the method for classifying quality of biological sensor data 110 and/or for being used in the method for classifying quality of biological sensor data 110. An exemplary embodiment of the method for classifying quality of biological sensor data 110 is shown in the flow diagram of
The method comprises the following steps which, as an example, may be performed in the given order. It shall be noted, however, that a different order is also possible. Further, it is also possible to perform one or more of the method steps once or repeatedly. Further, it is possible to perform two or more of the method steps simultaneously or in a timely overlap-ping fashion. The method may comprise further method steps which are not listed.
The method comprises the following steps:
As outlined above, the biological sensor 112 may be the at least one portable photoplethysmogram device 120. The biological sensor data 110 may comprise the at least one photoplethysmogram 126 obtained by the portable photoplethysmogram device 120. As an example, the quality may be used as quality indicator for heart rate variability data. The quality may be used for distinguishing between acceptable and non-acceptable heart rate variability data.
Further, either raw signals may be used, or processed or preprocessed signals may be used, thereby generating secondary signals, which may also be used as sensor signals 116. The method may comprise at least one pre-processing step (denoted by reference number 131) comprising one or more of filtering or normalizing the biological sensor data 110. As shown in
In the method, classifying quality may comprise discriminating between noisy and clean signals. Exemplary biological sensor data 110 are shown in
Turning back to
The trainable model 119 is trained on historical biological sensor data based on a supervised 134 and/or semi-supervised deep learning architecture 188. Exemplary embodiments of a supervised deep learning architecture 134 are shown in
The supervised deep learning architecture 134 may comprise at least one input layer 136 receiving the historical biosensor data and/or preprocessed historical biosensor data. For example, as input (denoted by reference number 138) filtered PPG signals of 10 seconds each with 20 Hz frequency may be used. Thus, the input 138 may comprise a signal 116 comprising 200 values, such as signals 116 exemplarily described in
In the method, manual labeled historical biological sensor data may be used for training the trainable model 119 based on the supervised deep learning architecture 134. As historical biological sensor data, a manual labeled PPG dataset may be used. For example, a training dataset of biological sensor data 110 may be set up as follows: Data was collected from 5 healthy volunteers (1 female and 4 male with average age of 33) without any supervision, during their normal daily activities, or during their night sleep. In total 13547 non-overlapping PPG signal samples were collected, of 10 seconds length each. The signals may be manually labeled by experts according to the instructions of Elgendi, M., “Optimal signal quality index for photoplethysmogram signals”, Scientific Reports, 3(4), 2016. 8305 noisy, and 5242 clean PPG signals were categorized. For example, a balanced dataset of 9380 labeled signal samples may be used as training dataset, specifically for training.
The training step may comprise preprocessing the historical biological sensor data, e.g. filtering the PPG signals of 10 seconds each with 20 Hz frequency, with in total 200 data points. The labels may be provided for each input signal for training with “0” indicating a noisy and “1” a clean signal.
The supervised deep learning architecture 134 may comprise a plurality of convolutional layers 140, in particular a stack of convolutional layers 142. The convolutional layers may be designed with dilation. The convolutional layers may be configured for dilated convolution. The supervised deep learning architecture 134 may comprise a WaveNet neural network, as described in further detail above. However, other deep neural networks, such as recurrent neural networks (RNNs) and/or a Long short-term memory (LSTMs) are also feasible. The supervised deep learning architecture 134 may comprise causal padding in each convolutional layer.
In the exemplary embodiments shown in
A kernel of size 3, 5, 7 or even other sizes may be used. A regularization strength may be in the range of 0.0005 and 0.002, e.g. 0.0005, 0.001, 0.0015 or 0.002. However, other ranges are possible. As can be seen in
The supervised deep learning architecture 134 may comprise an Adam optimizer, e.g. with learning rate of 0.00001 and with a decay where the learning rate is halved every ten or 100 or more epochs. However, other learning rates are possible. For example, for training, a batch size of 128, and epochs up to 300 may be used. For example, for training, a batch size of 128, and epochs from 50 up to 500 or even more may be used. However, other batch size and epochs are possible.
As shown in
The deep learning architecture, specifically the supervised deep learning architecture 134, may comprise as final layer, in particular the dense layer 168, an output layer 170 comprising one or two paths. The exemplary embodiments shown in
The first output 174 may be a class output providing the classified quality. The class output may use the first loss function. The first loss function may relate to classification loss. In the exemplary embodiment of
The second output 176 may be a mean squared error (MSE) output providing a measure for a difference between the input 138 and reconstructed signal 184 after the convolutions. The MSE output may use a Rectified Linear Unit (ReLU) activation function. The MSE output may use a MSE loss function. The MSE loss function may relate to a difference between a reconstructed input signal and the input signal 138 in terms of mean squared error (MSE). Lower MSE relates to better signal reconstruction. To estimate the MSE output two extra dense layers, i.e. a first extra dense layer 178 and a second extra dense layer 180 as shown in
The method may comprise at least one validation step. The validation step may be performed during training of the trainable model 119. The validation step may be used for monitoring improvement of the training. The validation step may comprise validating the trainable model 119 using at least one validation dataset. The validation dataset, for example, may comprise 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, wherein the 1000 non-overlapping manual labelled samples used for validation were not used for training.
The method may comprise at least one test step, wherein the test step comprises testing the trained trainable model 119. The test step may comprise testing the trained trainable model 119 on at least one test dataset. The test step may comprise obtaining performance characteristics of the trained trainable model 119, e.g. precision, recall, F1-score, area under the curve (AUC).
For example, as test data 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, were used. The 1000 non-overlapping manual labelled samples used for testing were not used for training. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
For the testing an epoch and regularization strength combination was used that has the highest F1-score, for maintaining a balance between false positives and false negatives. The classification threshold was selected in view that the trained trainable model 119 gives a value between [0, 1], with 0 meaning noisy signal and 1 clean signal. With a classification threshold of 0.5, in case the trained trainable model 119 output value is below 0.5, the signal 116 is regarded as noisy, and clean otherwise. This classification threshold may vary. Techniques for finding the optimal classification threshold are known to the skilled person, e.g. based on ROC curves.
Additionally or alternatively, for the testing of the trained trainable model 119, a completely independent labeled dataset may be used. For example, different participants may be used for collecting data for the training set to the ones used for training the model 119. For example, 1000 non-overlapping samples from the dataset as described in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671 may be used as test dataset. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
The following accuracy was found in case of an optimal classification threshold:
In the
The exemplary reconstructed signals 184 in
In the example of
In the example of the semi-supervised deep learning architecture 188, the method may comprise introducing an extra input parameter zinput (denoted by reference number 190) for training based on an unlabeled dataset. This may allow handling the “missing” labels. The extra input parameter 190 may be a binary value indicating if the specific data is labeled or not. For example, the extra input parameter 190 may be “0” for unlabeled data and “1” for labeled data. The extra input parameter 190 may be multiplied with the class output during the learning process such that the class learning may not be affected by unlabeled data.
Thus, as can be seen in
Manual labeled and unlabeled historical biological sensor data may be used for training the trainable model 119 based on the semi-supervised deep learning architecture 188. For unlabeled biological sensor data, the trainable model 119 may be trained by optimizing the loss function in terms of signal reconstruction and by disregarding the loss function in terms of classification. For training, the same balanced labeled dataset as with the supervised model 134 may be used, and in addition, unlabeled data described in the following:
For example, in particular in addition to the labeled PPG dataset described above, as historical biological sensor data for training, an unlabeled PPG dataset may be used. For example, the unlabeled PPG dataset may be set up as follows: Data was collected from 20 healthy volunteers (4 female and 16 male with average age of 32), while performing a series of activities in a supervised manner where the participant would switch activities every 5 minutes. For example, a protocol may be used comprising of multiple activities such as screening and informed consent process (while sitting, at rest), placement of ECG and PPG sensors (while sitting, at rest), baseline (sitting, at rest), paced breathing (ladder of increasing respiratory frequencies from 5 to 20 breaths per minute with steps of 5), 5 minutes of console gameplay (PS4 Aaero), orthostasis (standing, otherwise at rest), mental stress manipulation (Serial 7s [subtraction by 7 from 700, with eyes closed, pronouncing aloud each response]; e.g. as described in Ewing et al 1992), physical activity manipulation (uninterrupted indoor walking along a pre-set circular path; same path for all subjects), baseline (sitting, at rest), retrieve PPG/ECG equipment and debrief.
The activities included sitting in a resting position, paced breathing, console gameplay, orthostasis, mental stress manipulation, physical activity, and sitting in a resting position. Some activities are suspected to introduce different level of motion artifacts (e.g., physical activity, orthostasis and console gameplay), while others increase the heart rate and modify the PPG waveform (e.g., paced breathing). For example, in total 37564 non-overlapping PPG signal samples were collected, of 10 seconds length each. For more details of the collection of the unlabeled dataset reference is made to “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671.
As described above, the training dataset for the semi-supervised model 188 may comprise labeled and unlabeled historical biological sensor data. For example, the labeled 9380 balanced signal samples may be used and, additionally, collected unlabeled samples may be used.
As described above, the method may comprise the at least one validation step. The valida-tion dataset, for validating trainable model using the semi-supervised deep learning archi-tecture 188, for example, may comprise 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, wherein the 1000 non-overlapping manual labelled samples used for validation were not used for training.
As described above, the method may comprise the at least one test step. For example, for testing of the trained trainable model 119 being based on the semi-supervised deep learning architecture 188 as test data 1000 non-overlapping manual labelled samples out of the historical biological sensor collected from the 5 healthy volunteers, as described above, may be used. The 1000 non-overlapping manual labelled samples used for testing were not used for training. The accuracy of the semi-supervised model when using data from the test dataset from the 5 people is:
Additionally or alternatively, for the testing, other test data may be used. For example, 1000 non-overlapping samples from the unlabeled dataset collected as described above and in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671 may be used as test dataset. The samples used for testing were manually annotated. For example, the dataset used for testing may comprise 796 noisy and 204 clean PPG signals. As result, the accuracy was found as follows, wherein a classification threshold of 0.5 was used:
The following accuracy was found in case of an optimal classification:
The performance of the model 119 may be compared to the performance using a multivariate quality metric, such as proposed in “A quality metric for heart rate variability from photoplethysmogram sensor data”, of M. Zanon et al., PMID: 33018085, DOI: 10.1109/EMBC44109.2020.9175671. The PPG signal was collected simultaneously with ECG signal in order to compare the derived HRV features, and eventually estimate an HRV quality metric per signal, as described in Zanon et al., 2020. The HRV quality metric was computed for each PPG sample signal, and a PPG signal is regarded as trustworthy (i.e., clean) if the HRV quality metric value is below 20. It was found that the semi-supervised learning performs better than using a multivariate quality metric.
The training based on the semi-supervised deep learning architecture 188 may comprise training, firstly, with dataset of labeled data to learn the class label and relevant information for signal reconstruction. The training may, subsequently, comprise training only with a random subset of the unlabeled data to better learn the signal reconstruction. The training using the unlabeled data may further comprise introducing signal physiology that was not included in the labeled training set, e.g., different people, different activities. Annotating data is expensive, and very few annotated datasets are available. However, there is a lot of unlabeled data available. Semi-supervised learning may leverage unlabeled data and makes the most efficient use of small amounts of labeled data.
For both supervised 134 and semi-supervised deep learning architectures 188, the use of two loss functions may allow to jointly learn to use the labeled signals to classify, and thus, to distinguish clean from noisy signals, and to help the network learn more about the physiology of the signal 116.
Further, in the method, the trained trainable model 119 may be trained based on a combination 216 (not shown in
In the
When comparing the signal reconstruction of the semi-supervised 188 (
As outlined above, the method as described with respect to
Further,
A set of evaluation metrics for the supervised deep learning architecture 134 and the semi-supervised deep learning architecture 188 is shown in Table 1. Specifically, Table 1 shows the results for the supervised deep learning architecture 134 trained by optimizing one loss function in terms of classification 220, as exemplarily shown in
As outlined above, when referring to
The learning rate and decay can vary and this can further increase the accuracy. For example, in case of a learning rate of 0.00001 and with a decay where the learning rate is halved increased from 10 to 100, may result in an increase in accuracy. For example the accuracy may increase by 4.2% for the supervised deep learning architecture 134 trained by optimizing one loss function in terms of classification 220 with equal weights.
The confusion matrices for these architectures are shown in Tables 2 to 5. The columns of the confusion matrices indicate the labeled classification, wherein the rows of the confusion matrices indicate the classified quality obtained by using the trained model 119 with the respective architecture.
Further,
In this Example, in which 50% of the training data is used, as outlined above, the set of evaluation metrics for the supervised deep learning architecture 134 and the semi-supervised deep learning architecture 188 is shown in Table 7.
The confusion matrices for these architectures are shown in Tables 8 to 12.
Using the semi-supervised architecture 188 may allow to expand and/or transfer the proposed model to new dataset. For example, if there is a need to adapt the model to a new scenario, even with less data, such as up to less than 50% of the data used for training the originally trained model, e.g. because there are not enough labeled examples, it is possible to train a reliable model using the semi-supervised approach. It was found that accuracy remains >90% with all algorithms when optimal weight for the reconstruction loss is applied.
In
As can be seen in
| Number | Date | Country | Kind |
|---|---|---|---|
| 21192193.7 | Aug 2021 | EP | regional |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/EP2022/073126 | Aug 2022 | WO |
| Child | 18581036 | US |