Not applicable.
(none)
(none)
1. Field
This patent application relates to software designed to help people design and use monitoring systems to get the maximum possible information for minimum cost in hardware, software, human time, and data communications and storage. Applications include virtually any repeated data collection situation. This occurs today with monitoring both natural and man-made phenomena. Applications include weather, wildlife behavior research and monitoring geological phenomena such as earthquakes and volcanoes. Similar problems are encountered in monitoring manufacturing, in both discrete parts and continuous flow. Data are collected and used to control and diagnose problems with production and distribution of power, both in power plants and in distribution networks for electricity and transporting oil, natural gas, and coal. Data are used in controlling and diagnosing problems with motor vehicles such as automobiles, trucks, trains, aircraft and ships. Data are routinely collected and used to evaluate the health of structures such as bridges, tunnels, hazmat facilities, buildings, towers, scaffolding, cliffs, caves, mines, and oil and gas drilling and production operations, whether land or sea based as well as ecological and human social, political, economic and financial systems.
The essence of this invention lies in software to make it easy for people to design monitoring systems following a reasonable process such as the 9-steps described by Graves, Rens and Rutz (2011), including state space compression techniques documented in the Graves, Kovnat and Elliott (2011) provisional patent application.
This discussion of background first reviews prior art relating to monitoring in general before focusing on data compression. It ends with a description of the advantages of the invention over prior art.
2. Monitoring
There is a vast literature on monitoring. Advances appear regularly in leading academic journals such as the Journal of Quality Technology, to name only one.
However, most of this literature provides advances in an important but narrow abstraction to the problem of selecting (a) statistics to monitor and (b) limits to balance the delay to detection with either the probability of false alarms (Wikipedia, “Statistical hypothesis testing”) or (more recently) the false discovery rate (Benjamini and Hochberg 1995). This balance may be achieved in a variety of ways, e.g., by minimizing the false discovery rate with a given limit on expected delay to detection or by placing costs or disutilities on each unit of delay and each false discovery and then minimizing the expected cost or disutility.
One of the major obstacles to the growth of automatic monitoring and control systems is the limited availability to responsible decision makers of information on how to easily install and manage such systems to help them make better decisions. Civil engineers and their managers don't want to look at “all those squiggly lines” representing the behavior over time of infrastructure they manage, because they don't know how to translate that information into improved understanding of the condition of the inventory they manage and the consequences of delaying action.
Wenzel (2009, p. 423) outlined four levels of damage identification: (1) Detection, (2) Localization or isolation, (3) Quantification, and (4) Prognosis. There is an opportunity to help people learn basic concepts and principles of monitoring and control and translate those principles into appropriate action through software that provides an adaptable graphical user interface that makes it easy for naive users to think through the complexities of a problem they face and translate their thoughts into useful action. Adaptability means that the system would also be easy to use for an expert trying to get some modification not provided by simplistic solutions acceptable for many problems. Action includes the selection of hardware and data communications, storage and processing protocols with various kinds of adaptable decision limits connected to different responses. Limits may be designed to identify outliers, inadequate predictions, or inappropriate values for estimated parameter(s). Violation of the limits should generate responses such as the following:
There are opportunities to improve the practice of monitoring by helping people translate available information into appropriate limits and actions triggered by violations of said limits. Users also need help in understanding that initial theoretical computations of limits should in many cases be subsequently evaluated and revised by reference to actual experience; this seems not as far as we know adequately discussed in the prior art. This last step of revision of limits is critically important, because if people see too many alarms, they learn to ignore them, and the monitoring system can become worse than useless, because it engenders disrespect for the entire system. To combat this problem, the final two steps in the Graves, et al. (2011) “9-Step Process for developing a Structural Health Monitoring System” involve evaluating the monitors in actual use and improving people's use of the 9-step process.
Many advanced monitoring systems today can collect data in such high quantities that they vastly exceed the capacities of data communications and storage equipment available at a reasonable price. Of course the storage required can be expressed in terms of the number of bytes, while the communications capacity can be measured in bytes per unit time. While these are separate concepts, in most applications increasing the sampling rate has implications for both communications and storage unless something is done to collocate some substantive portion of the computations with the metrology using, e.g., so-called smart sensors. This opens opportunities for improving monitor design through intelligent choices about which computations provide high quality information at a reasonable cost to support high rates of sampling but much smaller numbers of bits or bytes transmitted and stored. This latter step can involve data compression. We now consider that literature.
Solomon and Motta (2010) provide a recent survey of the literature on data compression in their 1400 page Handbook of Data Compression, 5th ed. Data compression methods can be divided into lossless and lossy methods depending on whether the original data can be completely restored. Lossless methods are appropriate when any errors in reproduction can create problems. Lossy methods are preferred for data that can be decomposed into informative and noninformative components, where the noninformative component includes measurement error plus potentially finer resolution than is needed for the application of interest. With physical measurements, all realistic methods are ultimately lossy, because the measurements could always be recorded to greater superficial numerical precision. For example, the LabJack U6 programmable data acquisition device allows the user to specify a “resolution index”, giving greater numerical precision while requiring more time to convert each measurement (LabJack, 2010). Nearly all existing data compression algorithms accept the current digital format as given. The present invention provides foundational theory and methods for adjusting the resolution of analog to digital conversion where the needs of the application and the available hardware support that.
A key part of the present application is a lossy compression method that is very different from any other data compression method we've seen. Data collected on the performance of many physical systems start as analog signals that are then converted to discrete numbers at specific points in time. In many cases, the analog to digital conversion process provides more digits than the available instruments can reliably measure. Sufficiently low order bits follow a discrete uniform distribution and provide zero information about the process being measured. Standard lossless data compression algorithms (Wikipedia “data compression”) will in many cases fail to achieve much compression with analog data, and many actually expand rather than compress the data, because the patterns in the data do not match any of the patterns that the algorithm is designed to compress.
These kinds of data require lossy compression, but few lossy compression algorithms exploit the inherently statistical nature of data of these kinds. Moreover, they rarely consider what is known about the underlying physical behavior of the plant (or physical system) being monitored. The methods taught here provide an added incentive to improve the knowledge of the behavior recorded in the data, and those increases in knowledge could provide substantial economic benefits that were not previously considered worth the cost of the research.
Many lossy algorithms that have been developed so far focus on compressing video or audio so humans cannot detect the loss (e.g., Wikipedia “lossy compression”). Recent work has described data compression using piecewise constant approximation (Lazaridis and Mehrotra 2003), Kolmogorov-Sanai entropy (Titchener 2008), a Markov expert system (Cheng and Mitsenmacher 2005), statistical moments (Choi and Sweetman 2009), nonparametric procedures (e.g., Ryabko 2009, 2008), autoregressive moving average summaries (Sridhar et al. 2009), Fourier analysis (e.g., Reddy et al. 2009), extrema (e.g., Fink and Gandhi 2007), neural networks (e.g., Izumi and liguni 2006), and time series data mining (Li2010).
Fleizach (2006) reported good results with “Scientific Data Compression Through Wavelet Transformation”. However, we don't see any general rule in this for deciding how much compression is enough vs. too much.
Shafaat and Baden (2007) discussed “Adaptive Coarsening for Compressing Scientific Datasets”. Their algorithm involves deleting observations from a data set, computing a summary representation from the subset, then inverting the summary operation to interpolate values for the original data. When the differences between the interpolated values and the original data are too great, they stop. These are good ideas, but they provide little guidance for deciding how much error is acceptable.
In principle, piecewise constant approximations could be used in this way for data from accelerometers, for example. However, Lazaridis and Mehrotra (2003), who discussed piecewise constant approximations, used L∞ error bounds; the use of L∞ is equivalent to assuming uniformly distributed noise, which is produced by few physical processes we know. This suggests that more efficient data compression and subsequent information extraction could be achieved by improved modeling of both the underlying physical process, modeled by Lazaridis and Mehrota as piecewise continuous, and the noise, modeled implicitly by a uniform distribution.
Changes in temperature and displacement can often be modeled with second order differential equations plus measurement noise that may be a mixture of normal distributions but not a uniform. Such second order dynamics could easily be modeled as a hidden Markov process with a two- or three-dimensional state vector consisting of the position, velocity and possibly acceleration. A “Markov expert system” may include such a model as an option but will in general waste resources, including communications bandwidth and data storage capacity, considering alternatives that may be physically impossible for the particular application. Information theory has shown itself to be extremely useful for data compression and communications, but we have so far seen no literature that appropriately considers the known physics of the structure and sensors in so-called “information” or entropy-based data compression and communications. Neural networks and “expert systems” may outperform a Kalman filter that poorly matches the physics. However, we would not expect artificial intelligence to perform as well as real intelligence embedded in an algorithm that appropriately considers the physics of an application.
Systems for distributed Kalman filtering (e.g., Olfati-Saber 2007) can support models closer to the physics than the data compression algorithms we've seen. However, such systems can be extremely difficult to design and use, because we must either specify the model entirely when we install the sensors or allow the system to be reprogrammed remotely. If the model is completely specified in advance, it may be not be feasible to modify the mathematics later to exploit improvements in our understanding of the behavior of the structure. If the system can be reprogrammed remotely, it increases the cost of the smart sensors and computers located with the structure and increases the risks of hacker attacks.
For civil structures, the need for data compression in monitoring and control is widely recognized. Wang and Law (2007) describe, “Wireless sensing and decentralized control of civil structures”. They describe a wireless sensor network where Fourier transforms or autocorrelation functions are computed at sensor nodes to reduce the bandwidth requirements for transmitting data within a wireless local area network on the structure of interest. However, with data transmitted at regular intervals, there are still substantial opportunities for further savings in data communications and storage by (a) transmitting data only when important changes are seen in the Fourier or autocorrelation summaries and (b) limiting the number of bits or digits transmitted and stored through appropriate consideration of the 3-part decomposition (1) described in the next section with the advantages of the proposed system.
Huang et al. (2011) provide an overview of “compressive sensing”. This approach assumes that the behavior of the system monitored can be represented in relatively few dimensions. Finding those relatively few dimensions is essentially a problem of principle components or factor analysis, for which a huge variety of solutions have been developed over the years, with different methods optimal for different purposes.
New computer and sensor technologies provide vast opportunities to improve the productivity of human activities through better monitoring and control of all kinds of processes. The major factor limiting the increased use of these technologies is the limited understanding that potential beneficiaries of such monitoring have of the details of design and use of such monitoring systems. Our software is designed to make it easier for hobbyists, engineering students, practicing engineers and others to learn the principles of monitoring and apply them in applications of interest to them. As more people become better able to understand and use monitoring technologies, the rate of growth in use of those technologies will increase. This in turn can be expected to contribute to better decisions regarding how to get more value from existing investments at a lower total cost.
A portion of this software deals with the cost of data communications and storage. This is a major issue, especially with modern smart and wireless sensors deployed in remote locations where the electrical power budget is a major portion of the cost. Existing computer and sensor technology can support collecting data much faster than is needed most of the time and faster than can be justified economically generally, storing numbers with apparent precision far beyond the actual accuracy of the measurement equipment. The present patent application appears to be unique in decomposing monitoring data conceptually into (a) important information, (b) unimportant information, and (c) noise:
Observation=Important+Unimportant+noise (1)
It is common in statistics to decompose observations into (i) the true but unknown and unknowable and (ii) noise. We have not previously seen the “true” being further decomposed formally into “Important+Unimportant”. Existing lossy data compression algorithms implicitly decompose the data into “Important” and “Unimportant+noise”, e.g., in digital images with various levels of granularity or in telephone communications where the voice is intelligible except when it is required to clarify a word, to distinguish, e.g., between “pit”, “papa indigo tango” and “bit”, “bravo indigo tango”. Examples such as this show that our modern telephone system sometimes fails to preserve important distinctions between words.
There are various methods for estimating the probability distribution of noise. For example, the standard deviation of normal noise can be estimated by a study of gauge repeatability and reproducibility (Wikipedia, “ANOVA Gauge R&R”). There are many other methods for evaluating the probability distribution of noise from the residuals from of a model. For example, one common tool for evaluating serial dependence is the autocorrelation function (Wikipedia, “Autocorrelation”). If serial dependence is found in residuals, the model has apparently not captured the entire behavior of the plant. In such cases, the standard deviation of the residuals overestimates measurement error. Similarly, normal probability plots (Wikipedia, “Normal probability plot”) are often used to evaluate whether a normal distribution seems plausible and if not to suggest alternatives such as a contaminated normal (Titterington et al. 1985).
If the noise is not normally distributed but follows a distribution from a location-scale family of distributions, the scale factor can still be estimated, e.g., by maximum likelihood or a Bayesian procedure. Each residual is then expressed as an integer multiple of this scale factor. (Autocorrelation, normal probability plots, maximum likelihood and Bayesian estimate are common tools well known among people skilled in the art of data analysis.)
The new data compression methods taught herein begin with state space techniques well known in the statistical literature, e.g., Petris et al. (2009) or Dethlefsen and Lundbye-Christensen (2006). The simplest state space model may be an exponentially weighted moving average (EWMA). For a Kalman formulation of an EWMA, Graves et al. (2002) described how use (a) a gauge repeatability and reproducibility study (Wikipedia, “ANOVA gauge R&R”) to estimate the observation noise level and (b) reliability data to estimate the drift rate (i.e., the probability distribution of if the Kalman migration step). This provides two important advantages over other methods for compressing scientific data, e.g., Fleizach (2006) or Shafaat and Baden (2007): First, it provides statistical theory and a scientific procedure (gauge R&R) for evaluating “how good is good enough?” Second, it incorporates state space representations that could provide a very parsimonious summary that is as good as the physical theory behind the state space representation chosen. The state space representation also includes its own estimate of the uncertainty in its representation of the underlying phenomenon. We have seen nothing else in the literature that explicitly considers the uncertainty in knowledge about the plant.
For example, thin plate splines (Wikipedia, “Thin Plate Spline”) or some other suitable basis set could be used for functional data analysis (Ramsay et al. 2009) of turbulent flow, decomposing the results further into (a) a solution of Navier-Stokes equations, (b) a component that may still represent phenomena different from the hypothesised Navier-Stokes model, and (c) measurement error. This could be applied adaptively as suggested by Shafaat and Baden (2007), but could achieve substantially greater compression through the use of appropriate physical models for the phenomena under study.
Much of the following discussion describes normal observations on a multivariate normal state space with the deterioration or migration including normally distributed random increments. This is chosen for ease of exposition. As anyone skilled in the art of state space modeling knows, the ideas generalize to arbitrary observations and even arbitrary state space with very general evolution and deterioration processes, including observations following discrete distributions whose parameters follow some linear or nonlinear evolution. If the mathematics becomes difficult, they can be approximated in many cases with local linearizations of nonlinear processes. If that is not adequate, one can always move to something like particle filtering (Xue et al. 2009) and/or Markov Chain Monte Carlo (with an increase in the cost of computations).
The key idea is that the last stored model is used as long as it adequately predicts current behavior, with “adequacy” being defined relative to the magnitude of the “unimportant+noise” terms in the decomposition (1): Data inconsistent with predictions trigger data transmission. If the inconsistency seems to be an outlier (or group of outliers) inconsistent with the model, the outlier(s) is (are) transmitted. If the inconsistency suggests the system is following a state space model with the same general structure but different numeric values, that model is updated. In either case, the number of bits or digits transmitted is chosen to drop anything unimportant relative to the “unimportant+noise” portion of (1). Some experimentation might be appropriate to determine optimal rounding, but one would expect that anything smaller than 0.01 times the standard deviation of “important+noise” would contain very little information of practical importance, and in some cases, numbers rounded to the nearest standard deviation of “important+noise” might still retain sufficient information that greater numeric precision may not be worth the cost. This information can then be further compressed using any appropriate data compression system, e.g., transmitting only the differences from the last update when the change is modest relative to the overall magnitude of the numbers. This can result in massive reductions in the cost of data communications and storage, well beyond the current state of the art.
These data compression methods can exploit models of the behavior of the plant being monitored. However, no model is perfect. Accordingly, in addition to transmitting model updates and outliers, a sample of raw data will also be transmitted for subsequent off-line analysis and model improvement efforts (rounded as before to some fraction of the standard deviation of “important+noise”). Moreover, the best lossy compression algorithms could in many cases be improved by applying them to the residuals of the samples from the state space predictions, possibly even “sampling” 100 percent, so all the data are thusly compressed.
These data compression methods can be adaptive with details of the remote data compression algorithm reprogrammed based on earlier data and analyses. Reprogramming can be done either manually or automatically. It can change thresholds for outlier detection and issuing a new report on the condition of the plant. It can also change the basic state space model.
Also, in many applications with remote monitoring equipment, the on-site computer can store raw data that is not transmitted but kept locally for some period of time with older data being routinely overwritten by the new. This is similar to flight recorders on aircraft and can be used for similar forensic engineering purposes.
These cost reductions in data transmission and storage create opportunities for completely new data analysis methods, far beyond anything we have seen in the literature to date.
This patent application describes a software system for helping people follow a structured approach to designing monitoring systems.
The software system will also include the option of using at appropriate points a new data compression system that summarizes available data as (a) a probability distribution over possible states of a hypothesized plant, combined with (b) a rule describing the evolution (deterministic, stochastic, or a combination) of that probability distribution over time, and (c) a model for observations conditioned on the state of the plant. As each new observation arrives, the previously estimated probability distribution over possible states (the former “posterior” distribution) is updated to the current time, thereby producing a “prior” distribution, which is combined with the latest observation(s) using Bayes' theorem to form the new “posterior”. (This two-step Bayesian sequential updating cycle is further described in Graves 2007 and Graves et al. 2001, 2005.)
The methods taught in this patent application are organized into eight parts: [1] Overview of distributed processing. [2] Monitor design process. [3] Defining “Good” and “Bad”. [4] Overview of the novel state space compression concept. [5] Local data processing at the (typically remote) site of data collection. [6] Data transmission and storage. [7] Data analysis for detecting problems. [8] Data analysis for improving models. There is one Figure for each part.
Reference Numerals (In this patent application, no numbers are shared between figures, and the first digit provides the Fig. number):
With smart sensors 110 and/or a remote data concentrator 150, computations can be performed at various places such the smart sensors, the data concentrator(s) and/or the primary (possibly cloud) data center 170.
A general rule is to push as much of the computations as feasible as close to the data collection site/physical sensors as feasible. This follows, because data communications often dominate the power requirements at remote locations, especially since the power consumed by many sensors is quite low. The modern microprocessors used in many smart sensors consume relatively little power for computations. This encourages users of smart and wireless sensors to do much of their computations at the sensor node and only transmit terse summaries to a data concentrator at a relatively low frequency. This is especially true with wireless sensors which may be powered using energy harvesting of solar power, local vibrations, or wind, for example, depending on the exact location. In such cases, it may be wise to have the smart sensors store locally data and statistical summaries such as parameter estimates in state space models and only report under special circumstances. If the available power varies with time of day, weather and other conditions, noncritical reports may be stored until adequate power is available to preserve the power required to provide immediate reports if conditions so indicate. This is discussed in more detail with
Eventually (and sooner rather than later with exceptional conditions), data (summaries) ultimately arrive at a primary data repository (such as a cloud computing center), where they are evaluated and stored with possible immediate actions taken as detailed further with
One embodiment of the present invention is in the form of software to provide a structure to help people follow a sensible process for designing monitors for a variety of processes of interest. A “monitor” in this context is a system for collecting data at potentially informative times, and transmitting either raw data or summary statistics or both at selected times that may be informative, and using said data and/or summary statistics to determine possible interventions to either prevent the system monitored from malfunctioning or to minimize the damage from a malfunction. The structure will provide a step by step process for developing a monitor such as the one outlined in
The novelty here is to provide software to make it easy for people new to designing monitors as well as people experienced in the field to design effective monitors with less effort on their part required to remember and do all that is required to design a monitor with the required characteristics.
A lead OBD engineer said that OBD is a problem that looks easy but is in fact hard. This makes career management very difficult for OBD engineers, because their managers have difficulty understanding how OBD design could cost as much as it does. One almost minor portion of the difficulties is the first step in
Box et al. (2000), Graves (2007), and Graves et al. (2011, 2005, 2001), recommend designing monitoring systems by first defining good and bad, then describing how good systems go bad. Cusums have optimality properties for abrupt changes, while more gradually adaptive algorithms such as exponentially weighted moving averages or more general Kalman filters respond better to gradual deterioration. Any of the standard monitoring algorithms can be derived from a two-step Bayesian sequential updating cycle by suitable selection of assumptions for the underlying probability distribution over the state of the plant, the model for how the plant deteriorates and how the observations relate to the condition of the plant. These ideas are the core of
Data collection on virtually any process starts with sensors making observations (item 402 in
In many applications, the data capture uses a computer near the site where the data are collected 404, which may store data and summary statistics in local transient storage 406. Whenever a need for reporting is perceived, data are transmitted 408 to a central location 410 for real time monitoring, which issues alarms 412 as appropriate. As is obvious to anyone skilled in the art, the exact encoding and even the resolution of analog to digital conversion could be adjusted in real time in reaction to other events. For example, the detection of an earthquake in one place could cause the central processing to send commands to remote locations tightening thresholds and scale factors so more data is reported from the remote locations to the central site. These possibilities are not noted in
While we use the word “alarm” here (and elsewhere in this application), the basic ideas could be extended by someone skilled in the art to any real time action, including labeling observations as “exceptions” for future reference, as mentioned above.
A key element of the present invention is deciding when to report data from the remote site 404 to the central site 410 based on the degree to which current behavior of the plant is consistent with predictions based on the most recent previous report. These decisions will typically consider both prediction error bounds and the distance between the estimated state of the plant and some boundary representing a malfunctioning state. These prediction error bounds may be computed using standard statistical theory well known to those skilled in the art. Alternatively, as time passes, if the estimated state of the plant has not changed substantively for a while, that fact can be used to narrow the prediction error bounds. The exact algorithm for narrowing the prediction error bounds may use some exact theory (possibly with Monte Carlo) or a heuristic perceived to provide an acceptable approximation to what might be determined by a more theoretically grounded algorithm.
Real time processing 410 relies on an active database 414 for routine computations. As data ages, the demand for it decreases and some of it is archived 416 to a low activity database 418, where it may still be used for offline analysis 420 to produce management and scientific or engineering reports 422 on how to improve the operations of the system for the future.
Although
Details of local data processing and data compression 404-408 are described in a section below devoted to
The data compression algorithm taught here rests essentially on Bayesian concepts. Each processing cycle begins with a set Dt|v 502 containing all the information available at time v about the state of the plant at time t, t≧v. Processing typically begins with t=v=0 with D0|0 being typically though not necessarily the empty set. Associated with Dt|v is a probability distribution pt|v.
When each new observation yt arrives 504, it is first checked for consistency with the best information previously available summarized in pt|t-1 506. If the probability that it (or a recent string of observations) is unrealistically low, it is labeled an outlier 508. This could involve comparing yt with absolute limits. It could also involve comparing the difference between yt and predictions per pt|t-1 with limits on the absolute prediction error. In addition, the prediction error could be divided by its estimated standard deviation and compared with limits. This evaluation could also be based on processing multiple observations simultaneously. This could be important if the response were categorical rather than continuous.
Each outlier is further processed 510 to determine if is sufficiently extreme to require an immediate report 512 to the central repository 410 of
If yt is consistent with pt|t-1 516, we then want to know if it (possibly combined with observations since the last update time u) is (are) consistent with the last reported state space model pt|u 518. This evaluation may use standard statistical theory for determining prediction limits possibly shrunk to account for the information contained in the fact that another update has not been made since time u, as discussed with 404 in
If the latest observation is consistent the last reported state space model pt|u 520, it is added to the database and used to compute the posterior distribution pt|t 522. If yt is not consistent with pt|u 524, the latest posterior pt|t is computed and transmitted to a central repository as an update for pt|t 526, where it can be used for both real time monitoring and off line data analysis to improve management of the plant, possibly via improved scientific/engineering understanding of its behavior.
The processing of
In addition to the immediate processing 506-526, each observation is written to local transient storage. From there asynchronously, all outliers and samples of other observations are transmitted 528 to the central repository for further processing. These samples may be selected via simple random sampling or in bursts with randomly selected starting points or systematically, initiated under certain conditions. For example, Rutz and Rens (2008) sampled data every 0.1 sec. but only when the wind was of certain intensities and from specific directions.
This transient storage can then be accessed manually, typically after a failure of the plant, as it may provide more detail of the recent history. With a failure of this local monitoring system (e.g., accompanying a failure of the plant monitored), this transient storage may help people understand the failure.
For the present applications, we could apply any reasonable method for lossy data compression to the numbers in the state space representation and to the random samples and outliers identified for transmission to the primary data repository (170 in
Design of a reasonable state space compression system must carefully consider the three components of expression (1) above along with the migration/deterioration portion of the state space model. If typical changes in the state are small relative to the typical magnitude of the numbers [e.g., the “important” part of (1)], a standard tool of data compression is to transmit and store the change from the previous update, as it would have fewer bits or digits than the whole number. To protect against problems from transmission errors, it may be wise in such cases to schedule transmission of the full numbers after dropping part of the number that clearly represents “noise” in (1) and possibly also part or all of the “unimportant”. In any event, it will only rarely be necessary to carry numbers more accurate than some fraction (such as 0.1 or 0.3) of the maximum of the noise standard deviation and a comparable measure of what variations would be considered unimportant. In many cases, this can be implemented by appropriate centering and scaling, i.e., subtracting a center from each number and dividing by a scale factor, then rounding the result to obtain an integer for transmission and/or storage.
This gives us several things to consider in tailoring a state space compression algorithm to a particular application: (a) Modifying the state space model used, e.g., replacing an exponentially weighted moving average with a model that considers temperature or time of day in making predictions. When appropriate, modifications like this can reduce the noise, making predictions more accurate and possibly reducing the frequency with which reports are required to achieve a given level of accuracy in predicting future observations. (b) Adjusting the sampling frequency, i.e., thresholds for when an update is required or an observation is declared an outlier or selecting a sampling frequency for raw data. (c) Adjusting the number of bits or digits to carry in the numbers, as just discussed. In many cases, the system can be optimized by standard methods of empirical optimization (e.g., Box and Draper 2007), especially if the remote system of
By enhanced data flow management, we mean managing increasing volumes of data, proactively designing and managing a database management system with the flexibility to quickly adjust to rapid changes in the volumes of data and the number of sources. A small part of this is outlined in
If the new data are from a new epoch, the decision 610 then flows 612 to a step 614 that creates a new database instance for the new epoch 616 while initiating a process to move some older data as convenient to slower access, long term storage 618 to create space on the faster access storage for more data. Whether the new data are from a new epoch nor not 620, it is stored 622 in the active database for that epoch 616. Then at appropriate intervals, data may be further compressed, summarized or subsampled for long term active reference 624 in a master database 626. For convenience in this discussion, we may refer to a “Daily database” as a shorthand for “Active database for the current epoch”, even if the chosen epoch is not a single day, and the actual system may involve more that two levels, e.g. with a fast epoch of each hour storing summaries in a database with a slow epoch of a week storing fewer summaries in the final master summary database that is presumably sufficiently sparse to not require further data concentration.
With appropriate data compression at a remote site, the volume of data at a central site may be low enough that the “Daily database” and “Master summary database” may be combined.
With large numbers of sites, a “Daily database” instance may be constructed for each combination of epoch and remote site (or data source). Meanwhile, the “Master summary database” may be combined for a group of sites sharing common characteristics in addition to or in lieu of having a “Master summary database” for each site. For example, to manage an inventory, e.g., of bridges, it may be desired to construct a combined “Master summary database” that includes data so it is easy to combine data from sources (bridges) of similar design, age, material, length, or any other characteristic of interest.
The two-step Bayesian sequential updating cycle described in Graves (2007) and Graves et al. (2005, 2001) needs to be modified for multistage processing of data collected and reported as taught in the present patent application. As noted with
Raw data 706 must be stored 708 for future reference in studies of whether and how models can and should be improved. Outliers 710 must first be evaluated to determine if immediate action is required 712. If the observation is a statistical outlier without apparent practical importance 714, the observation should still be processed and stored for future reference as appropriate 716. This processing may be different depending on whether the state space models estimated at the remote site 720 includes explicit consideration of outliers, e.g., by assuming that outliers follow a contaminated normal distribution. If the outlier is sufficiently extreme to require immediate action 718, an appropriate alarm is issued to any of several possible stakeholders depending on the exact nature of the outlier. In either case, the outliers would then be further processed to prepare to react more appropriately to other outliers received in the future, possibly increasing the probability of taking other action upon the arrival of other observations in the future.
The monitoring system must include procedures (not shown in
Processing of state space changes 720 will be somewhat different from traditional theory, because they will only be reported if the last reported state space model estimate is inconsistent with the currently stored model. This fact means that the absence of a report itself provides information that the changes since the last reported state space estimate are not great, e.g., in step 728 and the result that is stored 730. Conversely, reported outliers 710 may provide evidence questioning the need to narrow such limits, e.g., in step 716. These observations suggest opportunities to modify the standard deterioration step in the Bayesian two-step sequential updating cycle. Careful statistical analysis might provide a precise method for modeling deterioration to incorporate the information contained in the absence of an update. However, sensible results will likely be obtained by simpler ad hoc adjustments merely limiting the growth in the uncertainty of the probability model portion of the state space estimate.
If immediate action seems to be required, any of a number of previously programmed alternative actions will be taken 726. Whether or not 724 immediate action is required, the new state space model may be further processed for possible future alarms or reports. A first step in this will be to store the newly reported state space model in the active database. Further processing may use the new report to update a variety of potentially more sophisticated models developed since the design of the remote monitoring system. This might be used in conditions where the remote monitoring system may be difficult to update or of limited computational capacity while the central processing might be more easily changed and enhanced to reflect new knowledge acquired in management, scientific or engineering studies as discussed with
State space data compression opens many possibilities for completely new methods of statistical analysis to exploit its unique character. We consider distribution analysis 802 in
One of the most powerful methods for univariate distribution analysis is a QQ plot, especially a normal probability plot. However, the raw data available will typically be a mixture of 100 percent of observations beyond certain limits (outliers) and some small percentage of samples from the central region of the distribution. QQ plotting algorithms will need to be modified to consider the limits and the sampling frequencies in different ranges.
One of the primary reasons for making QQ plots, especially normal probability plots, is to help understand any outlier mechanism. Normal data plot close to a straight line in a normal probability plot. Data with outliers that come from a complete different distribution typically present the appearance of two or more straight line segments in a normal plot. For example, if the outliers come from a normal distribution with a higher standard deviation, the data from the distribution with the smaller standard deviation will appear straight with one slope while much of the data from the distribution with a higher standard deviation may appear at both ends of the central distribution with a common but different slope proportional to the higher standard deviation. From examining the plot, one can get rough estimates of the means and standard deviations of the two components of the mixture as well as the percent of observations from each distribution (Titterington et al. 1985). On other occasions, a normal probability plot may look like a relatively smooth curve. This could indicate a need for a transformation or possibly a skewed or long-tailed distribution. However, with raw data sampled as described here, the sampling method must be considered appropriately in construction of a QQ plot, as without that the image in the plot could be very misleading.
Similarly methods for evaluating serial dependence will need to consider the sampling methodologies used with the raw data. Sampling bursts of data will support estimation of short term serial dependence. Observations collected farther apart in time will need to consider the time difference between observations.
Similar plots might be made of the mean vectors in the state space representation and of their first differences. However, again the reporting process must be appropriately considered in the construction of distributional analyses.
Good statistical practice typically starts, as just outlined, with univariate distributional analyses. This is because multivariate analyses imply certain assumptions for the univariate components, and violations of these assumptions are so common that much time can be wasted on inappropriate multivariate analyses if the basic univariate distributional assumptions are not checked first.
The new state space data compression methods taught in this patent application provide opportunities for at least three very different kinds of multivariate analyses. First, if the state space model used at the remote site includes multivariate observations, then the multivariate residuals should be considered for consistency with the multivariate observation component of the state space model. Multivariate normal residuals from a state space model can be examined together as a Hotelling's T-square. That is either a scaled chi-square or an F distribution, and QQ plots appropriate to those distributions can be profitably examined (adjusting as before for the sampling methodology). Other plots of observations and residuals can be used to look for relationships different from those assumed in the model.
Beyond this, in many cases, the data compression at the remote site will involve relatively simple models, e.g., exponentially weighted moving averages (EWMAs), while more complicated models can be developed later to exploit a better understanding of the relationships between variables. For example, data from a bridge might include temperature and various measures of the motion of the bridge due to thermal effects. The installation at the remote site might provide updates on all variables simultaneously or apply a state space compression algorithm to each variable separately. Analysis of data from simultaneous updates will be easier, but separate compression of each variable might be more efficient in the cost of data communications and storage, depending on the reporting frequencies of the different models.
With asynchronous reporting, various methods can be used to look for relationships between different variables. For example, pseudo-observations can be constructed at selected points in time for variables of interest using the state space models. An advantage of this is that each pseudo-observation comes with an estimate of standard error that could be used in the analysis. However, these pseudo-observations will rarely be statistically independent. This fact will invalidate standard statistical tests that might otherwise be performed. New models may need to be tested using the samples of raw data reported to the central database.
Alternatively, techniques for functional regression and correlation might be used (Ramsay et al. 2009). This might be particularly valuable with monitoring thermal effects on a structure, where temperature is measured at only one point on the bridge and uniform heating cannot be assumed.
This patent application teaches those skilled in the arts of data compression and statistical analysis how to dramatically reduce the volume of data transmitted and stored to characterize the evolution of a system of interest, called a “plant” for consistency with the control theory literature. It does this by summarizing virtually any kind of data into an appropriate state space model and transmitting and storing the state space model only when the previously stored model does not adequately predict recent observation(s) and transmitting only enough bits of digits required to retain the important information. This document also provides an overview of special data analysis procedures required to extract information from this new data compression format. This patent application also teaches basic concepts of data flow management as applied to data compressed using the state space summarization methods taught herein. These techniques can be applied with a simple monitoring system involving only one computer or a distributed system involving multiple levels of data compression and analysis following this general outline before the data arrives at a central data center for global data analysis and storage. These methods become increasingly important with increases in the numbers of sensor nodes, sampling frequency, plants being monitored and with the general complexity of the infrastructure.
Wang, Yang, and Law, Kincho H. (2007) Wireless Sensing and Decentralized Control for Civil Structures: Theory and Implementation, Report No. 167, Blume Earthquake Engineering Center, Stanford U. (https://blume.stanford.edu/tech_reports, accessed 2010 Dec. 4)
This application claims priority to U.S. Provisional Application No. 61/431,193, filed Jan. 10, 2011, and International Application No. PCT/US2010/002162 filed Aug. 4, 2010. These are incorporated herein by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| 61431193 | Jan 2011 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/US2010/002162 | Aug 2010 | US |
| Child | 13343440 | US |