This description relates generally to data analysis, and more particularly to denoising and data fusion of biophysiological rate features.
Data analysis generally encompasses processes of collecting, cleaning, processing, transforming, and modeling data with the goal, for example, of accurately describing the data, discovering useful information or features among the data, suggesting conclusions, or supporting decision-making. Data analysis typically includes systematically applying statistical or logical techniques to describe, condense, illustrate and evaluate data. Various analytic techniques facilitate distinguishing the signal or phenomenon of interest from unrelated noise and uncertainties inherent in observed data.
Sensor data fusion techniques typically provide higher-level information from data observed at multiple sensors, for example, employing spatio-temporal data integration, exploiting redundant and complementary information, as well as available context. Exploratory data analysis often applies quantitative data methods for outlier detection attempt to identify and eliminate inaccurate data. In addition, descriptive statistics, such as the statistical mean, median, variation or standard deviation may be generated to help interpret the data. Further, data visualization may also be used to examine the data in graphical format, providing insight regarding the information embedded the data.
In general, statistical hypothesis testing, or confirmatory data analysis, employs statistical inference to determine if a result is significant based on a confidence interval or threshold probability. Model selection techniques may be employed to determine the most appropriate model from multiple hypotheses. Decision theory and optimization techniques, including chi-square testing, may further be employed to select the best of multiple descriptive models. Statistical inference methods include, but are not limited to, the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the focused information criterion (FIC) the deviance information criterion (DIC), and the Hannan-Quinn information criterion (HQC).
A photoplethysmogram (PPG) is an optically obtained plethysmogram, or volumetric measurement of an organ. The pulse oximeter, a type of PPG sensor, illuminates the skin with one or more colors of light and measures changes in light absorption at each wavelength. The PPG sensor illuminates the skin, for example, using an optical emitter, such as a light-emitting diode (LED), and measures either the amount of light transmitted through a relatively thin body segment, such as a finger or earlobe, or the amount of light reflected from the skin, for example, using a photodetector, such as a photodiode. PPG sensors have been used to monitor respiration and heart rates, blood oxygen saturation, hypovolemia, and other circulatory conditions.
Conventional PPGs typically monitor the perfusion of blood to the dermis and subcutaneous tissue of the skin, which may be used to detect, for example, the change in volume corresponding to the pressure pulses of consecutive cardiac cycles of the heart. If the PPG is attached without compressing the skin, a secondary pressure peak may also be seen from the venous plexus. A microcontroller typically processes and calculates the peaks in the waveform signal to count heart beats per minute (bpm).
However, signal noise from sources unrelated to desired features, including, for example, motion artifacts and electrical signal contamination, have proven to be a limiting factor affecting the accuracy of PPG sensor readings. While the signal noise from sources unrelated to desired features may be avoided in a clinical environment, this signal noise may have an undesirable effect on PPG sensor readings taken in free living conditions, for example, during exercise. As a result, some existing data analysis methodologies may have drawbacks when used with PPG sensor readings taken in free living conditions.
According to one embodiment, a device includes a memory that stores machine instructions and a processor coupled to the memory that executes the machine instructions to receive a plurality of feature data points and extract a feature from a feature data point of the plurality of feature data points that satisfy a predetermined range. The processor further executes the machine instructions to perform a plurality of hypothesis tests to determine whether the feature corresponds to each of a plurality of predetermined hypothesis distributions comprising a first hypothesis distribution. If the feature corresponds to the first hypothesis distribution, the processor further executes the machine instructions to qualify the feature as a qualified estimate of an actual feature.
According to another embodiment, a method includes receiving a plurality of feature data points and extracting a feature from a feature data point of the plurality of feature data points that satisfy a predetermined range. The method further includes performing a plurality of hypothesis tests to determine whether or not the feature corresponds to each of a plurality of predetermined hypothesis distributions comprising a first hypothesis distribution. The method also includes qualifying the feature as a qualified estimate of an actual feature if the feature corresponds to the first hypothesis distribution.
According to yet another embodiment, a computer program product includes a non-transitory, computer-readable storage medium encoded with instructions adapted to be executed by a processor to implement receiving a plurality of feature data points and extracting a feature from a feature data point of the plurality of feature data points that satisfy a predetermined range. The instructions are further adapted to implement performing a plurality of hypothesis tests to determine whether or not the feature corresponds to each of a plurality of predetermined hypothesis distributions comprising a first hypothesis distribution. The instructions are also adapted to implement qualifying the feature as a qualified estimate of an actual feature if the feature corresponds to the first hypothesis distribution.
The details of one or more embodiments of the present disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the present disclosure will be apparent from the description and drawings, and from the claims.
The data points include a data fusion from multiple sources coming from different features on the same underlying sensors, or different sensors. For example, the data points include feature data regarding a subject's heart rate and respiration rate observed over time using photoplethysmogram (PPG) sensors, such as pulse oximeters. In one embodiment, the PPG sensor and the biophysiological periodic data analyzer may be embedded in a wearable device that is fastened to a subject, for example, the subject's head, foot, finger, and wrist.
The feature receiver 12 sorts the monitored feature data points and places the data points in order, for example, feature-by-feature. The feature receiver 12 outputs each ordered data point along with a synchronous time output. The rate calculator 14 uses the most recent data point and a corresponding time output to calculate the current feature rate based on a series of recent data points.
The outlier eliminator 16 determines whether the current feature rate falls within an acceptable range based on a set of predetermined biological limits regarding the feature, for example, minimum and maximum rate limits. A current feature rate that falls outside the acceptable range are not used in further calculations. The recent rate calculator 18 uses a series of current feature rates within the acceptable range during a desired window of time to calculate an updated recent feature rate.
The outlier eliminator 16 imposes constraints on the hypotheses based on biophysiological limits. For example, a minimum limit (‘minHR’) and a maximum limit (‘maxHR’) may be based on the realistic expected range of human heart rates. Similarly, minimum and maximum relative limits (‘+/−deltaHR’) centered around the recently observed heart rate value (uRecent) may be based on physiological limitations regarding the rate of change of the heart rate over the sampling time.
The rate filter 20 performs statistical calculations on qualified feature data from the biosemantic binary qualifier 24, which is further explained below.
The fusion at the hypothesis level follows an approach equivalent to that used in the generic multiple-model adaptive estimation framework, as described in the context of Kalman filters by P. D. Hanlon and P. S. Maybeck in “Multiple-Model Adaptive Estimation Using a Residual Correlation Kalman Filter Bank,” IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-36, No. 2, April 2000, pp. 393-406, the entirety of which is incorporated herein by reference. The Kalman filter estimation involves an estimate and an uncertainty of the state of the system. For instance, in an embodiment, an unscented Kalman filter associated with alternate hypotheses of system behavior is used, which explicitly fits a distribution from deterministic sampling of the input, as described in Simon J. Julier & Jeffrey K. Uhlmann, “A new extension of the Kalman filter to nonlinear systems”, Int. Symp. Aerospace/Defense Sensing, Simul. and Controls, vol. 3, p. 182, 1997, the entirety of which is incorporated herein by reference.
The biosemantic binary qualifier 24 determines qualified data, or qualifies data, based on a binary selection criterion for each input feature, based on compatibility with learned probabilistic models (many possible methods for model development). The binary selection approach handles input data, even when there is a large fraction of anomalies, or uncertainty, in the feature data. The biosemantic binary qualifier 24 includes, for example, a maximum likelihood decision engine. The biosemantic binary qualifier 24 produces qualified data as output.
In an embodiment, the biosemantic binary qualifier 24 uses the recent rate along with the filtered and unfiltered rates of change to perform a hypothesis testing method 40. Multiple hypothetical models are considered for each observed data point, and the decision to accept the point is made based on a decision rule for each hypothesis. The model hypotheses incorporate biophysical limits on both on rates of change and the hard limits on the values of the inputs, grounded in biophysiological constraints. Each hypothesis transforms the input feature differently, depending on the nature of the hypothesis.
Referring to
The biosemantic binary qualifier 24 tests each of the hypotheses on the basis of a probabilistic test. For instance, in the case of the first hypothesis type described, both the recent distribution 52 and the candidate point 58 are available. Therefore, the computation of the posteriori likelihood of the point being derived from the distribution is used to represent the posteriori likelihood of the associated hypothesis.
Each hypothesis is considered independently—on the basis of its own test against a null hypothesis. For instance, a hypothesis is based on exceeding a threshold in a log-likelihood ratio test, or in exceeding a threshold with respect to the affinity to the distribution associated with the hypothesis. Following this, all hypotheses which overcome the null hypothesis are ranked based on an a priori ranking among hypotheses and the highest ranked hypothesis is selected. This has the advantage that diverse hypothesis types may be considered—some with an explicit probability model for which likelihood may be computed, but others using logical triggers for which no explicit probability model exists.
Thus, these statistics are combined among the different data sources, and then applied across each of the hypotheses. Alternatively, separate statistics may be calculated associated with each data type and these may be selectively attached to different hypotheses.
In an alternate embodiment in which all of the hypotheses have explicit probabilities, the hypothesis selection may then proceed by computing the relative likelihood of each hypothesis computed and selecting the most likely hypothesis is selected as being correct. This triggers certain logic, as described below, to either accept or to reject the candidate point.
For example, the feature data point may be accepted as measured, based on a relatively high correlation to the hypothesis associated with the recent distribution 52. Otherwise, the feature modifier 26 may modify the feature data point before it is accepted, for example, based on a relatively high correlation to the hypothesis associated with the trial distribution 54. On the other hand, the feature data point may be dropped from the output stream, based on a relatively high correlation to the hypothesis associated with the artifact distribution 56.
The filter generator 28 updates the rate filter 20 and provides feedback to the biosemantic binary qualifier 24 to develop the model hypotheses. The model hypotheses are stochastic processes, which calculate the increases in uncertainty associated with the time-sensitivity of information gathered. If no recent feature data has been explained, the uncertainty grows. In an embodiment, the statistics calculation implements, for example, a Langevin correction. This modifies the probability model to account for the time value of data by growing the model variance with the time gap period. In an embodiment, the Langevin model, which is based on physical models of Brownian motion, grows the model variance linearly with time.
Referring to
A set of fixed, or absolute, biophysiological limits regarding the features are received at 74, and a determination is made at 76, regarding whether the rate and/or trial rate at 72 fall within an acceptable range defined by the biophysiological limits. If the rate and/or trial rate at 72 are found to be within the acceptable range at 76, the process continues at 80 of
Referring to
In addition to the absolute limits applied at 76, the present method also detects conditions in which limits on the allowable rate of change have been exceeded. A dynamic limit computed by the statistics of the recent time window, such as a confidence interval. For example, a ninety-percent confidence interval, a ninety-two-percent confident interval, or a ninety-five-percent confidence interval is applied based on a probabilistic model fit with respect to the previous window.
Statistical feedback data from
Statistical hypothesis testing and data fusion are performed at 94, for example, by a maximum likelihood decision engine (biosemBinaryQualifier, or BBQ), to determine the event type based on the biophysiological limits at 74, the recent rate at 86, the delta rate at 84, the filter delta rate and the trial delta filter rate at 92 and statistical feedback data at 112 from
Referring to
If the event type at 96 is determined to belong to a hypothesis category, type 0, no further processing is performed regarding the event type at 102. If the event type 96 is determined to belong to a hypothesis category, type 1, the feature is passed along without modification at 104. If the event type 96 is determined to belong to the category, type 2, the feature is modified according to a suitable model at 106.
At 108, the feature outputs at 104 and 106 are combined with the time at 70 of
In addition, in an alternative embodiment, the final result may be temporally smoothed to improve the precision, albeit at the expense of responsiveness. For example, the feature stream may be estimated using various data smoothing approaches including, for example, a boxcar moving average filter, an exponential moving average filter, or the like. For example, the qualified feature stream and the smoothed feature stream provide two estimates of the true heart rate of a subject over time based on the measured heart rate data represented by the feature data streams.
Statistical data is computed based on the qualified feature with regard to a corresponding window of time at 112, and the filter criteria is developed to update the recent rate filter at 88 in
As illustrated in
The computing device 120 may be used, for example, to implement the method of analyzing biophysiological periodic data of
Aspects of this disclosure are described herein with reference to flowchart illustrations or block diagrams, in which each block or any combination of blocks may be implemented by computer program instructions. The instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to effectuate a machine or article of manufacture, and when executed by the processor the instructions create means for implementing the functions, acts or events specified in each block or combination of blocks in the diagrams.
In this regard, each block in the flowchart or block diagrams may correspond to a module, segment, or portion of code that including one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functionality associated with any block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or blocks may sometimes be executed in reverse order.
A person of ordinary skill in the art will appreciate that aspects of this disclosure may be embodied as a device, system, method or computer program product. Accordingly, aspects of this disclosure, generally referred to herein as circuits, modules, components or systems, may be embodied in hardware, in software (including firmware, resident software, micro-code, etc.), or in any combination of software and hardware, including computer program products embodied in a computer-readable medium having computer-readable program code embodied thereon.
It will be understood that various modifications may be made. For example, useful results still could be achieved if steps of the disclosed techniques were performed in a different order, and/or if components in the disclosed systems were combined in a different manner and/or replaced or supplemented by other components. Accordingly, other implementations are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Application No. 62/110,263, filed Jan. 30, 2015; U.S. Provisional Application No. 62/112,032, filed Feb. 4, 2015; and U.S. Provisional Application No. 62/113,092, filed Feb. 6, 2015, which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62110263 | Jan 2015 | US | |
62112032 | Feb 2015 | US | |
62113092 | Feb 2015 | US |