The present disclosure relates to the field of anomaly detection in machines, and more particularly to use of machine learning for near real-time detection of engine anomalies.
Machine learning has been applied to many different problems. One problem of interest is the analysis of sensor and context information, and especially streams of such information, to determine whether a system is operating normally, or whether the system itself, or the context in which it is operating is abnormal. This is to be distinguished from operating normally under extreme conditions. The technology therefore involves decision-making to distinguish normal from abnormal (anomalous), in the face of noise, and extreme cases.
In many cases, the data is multidimensional, and some context is available only inferentially. Further, decision thresholds should to be sensitive to impact of different types of errors, e.g., type I, type II, type III and type IV.
Anomaly detection is a method to identify whether or not a metric is behaving differently than it has in the past, taking into account trends. This is implemented as one-class classification since only one class (normal) is represented in the training data. A variety of anomaly detection techniques are routinely employed in domains such as security systems, fraud detection and statistical process monitoring.
Anomaly detection methods are described in the literature and used extensively in a wide variety of applications in various industries. The available techniques comprise (Chandola et al., 2009; Olson et al., 2018; Kanarachos et al., 2017; Zheng et al., 2016): classification methods that are rule-based, or based on Neural Networks (see, en.wikipedia.org/wiki/Neural_network), Bayesian Networks (see, en.wikipedia.org/wiki/Bayesian_network), or Support Vector Machines (see, en.wikipedia.org/wiki/Support-vector_machine); nearest neighbor based methods, (see, en.wikipedia.org/wiki/Nearest_neighbour_distribution) including k-nearest neighbor (see, en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) and relative density; clustering based methods (see, en.wikipedia.org/wiki/Cluster_analysis); and statistical and fuzzy set-based techniques, including parametric and non-parametric methods based on histograms or kernel functions.
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k=1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms. Both for classification and regression, a useful technique can be used to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor. The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. The k-NN algorithm is that it is sensitive to the local structure of the data.
Zhou et al. (2006) describes issues involved in characterizing ensemble similarity from sample similarity. Let Ω denote the space of interest. A sample is an element in the space Ω. Suppose that α∈Ω and β∈Ω are two samples, the sample similarity function is a two-input function k(α, β) that measures the closeness between α and β. An ensemble is a subset of Ω that contains multiple samples. Suppose that {α1, . . . , αM}, with αi∈Ω, and ={β1, . . . , βN}, with βj∈Ω, are two ensembles, where M and N are not necessarily the same, the ensemble similarity is a two-input function k(, ) that measures the closeness between and . Starting from the sample similarity k(α, β), the ideal ensemble similarity k(, ) should utilize all possible pairwise similarity functions between all elements in and . All these similarity functions are encoded in the so-called Gram matrix. Examples of ad hoc construction of the ensemble similarity function k(, ) include taking the mean or median of the cross dot product, i.e., the upper right corner of the above Gram matrix. An ensemble is thought of as a set of i.i.d. realizations from an underlying probability distribution (α). Therefore, the ensemble similarity is an equivalent description of the distance between two probability distributions, i.e., the probabilistic distance measure. By denoting the probabilistic distance measure by J(, ), we have k(, )=J(, ).
Probabilistic distance measures are important quantities and find their uses in many research areas such as probability and statistics, pattern recognition, information theory, communication and so on. In statistics, the probabilistic distances are often used in asymptotic analysis. In pattern recognition, pattern separability is usually evaluated using probabilistic distance measures such as Chernoff distance or Bhattacharyya distance because they provide bounds for probability of error. In information theory, mutual information, a special example of Kullback-Leibler (KL) distance or relative entropy is a fundamental quantity related to channel capacity. In communication, the KL divergence and Bhattacharyya distance measures are used for signal selection. However, there is a gap between the sample similarity function k(α, β) and the probabilistic distance measure J(, ). Only when the space Ω is a vector space say Ω=d and the similarity function is the regular inner product k(α, β)=αTβ, the probabilistic distance measures J coincide with those defined on d. This is due to the equivalence between the inner product and the distance metric.
∥α−β∥2=αTα−2αTβ+βTβ=k(α,α)−2k(α,β)+k(β,β).
This leads to consideration of kernel methods, in which the sample similarity function k(α,β) evaluates the inner product in a nonlinear feature space Rf.
k(α,β)=φ(α)Tφ(β), (1)
where φ: Ω→f is a nonlinear mapping, where f is the dimension of the feature space. This is the so-called “kernel trick”. The function k(α,β) in Eq. (1) is referred to as a reproducing kernel function. The nonlinear feature space is referred to as reproducing kernel Hilbert space (RKHS) k induced by the kernel function k. For a function to be a reproducing kernel, it must be positive definite, i.e., satisfying the Mercer's theorem. The distance metric in the RKHS can be evaluated
∥ϕ(α)−ϕ(β)∥2=Φϕα)Tϕ(α)−2ϕ(α)Tϕ(β)+ϕ(β)Tϕ(β)=k(α,α)−2k(α,β)+k(β,β) (2)
Suppose that N(x;μ,Σ1) with x∈d is a multivariate Gaussian density defined as N(x;μ,Σ1)=1/(√((2π)d |E| exp{−½(x−μ)TΣ−1(x−μ)},
where x∈d and |⋅| is matrix determinant. With p1(x)=N(x;μ1,Σ1) and p2(x)=N(x;μ2,Σ2), the tables below list some probabilistic distances between two Gaussian densities.
When the covariance matrices for two densities are the same, i.e., Σ1=Σ2=Σ, the Bhattacharyya distance and the symmetric divergence reduce to the Mahalanobis distance: JM=JD=8JB:
A support vector data description (SVDD) method based on radial basis function (RBF) kernels may be used, while reducing computational complexity in the training phase and the testing phase for anomaly detection. The advantages of support vector machines (SVMs) is that generalization ability is improved by proper selection of kernels. Mahalanobis kernels exploit the data distribution information more than RBF kernels do. Trinh et al. 2017 develop an SVDD using Mahalanobis kernels with adjustable discriminant thresholds, with application to anomaly detection in a real wireless sensor network data set. An SVDD method aims to estimate a sphere with minimum volume that contains all (or most of) the data. It is also generally assumed that these training samples belong to an unknown distribution.
Gillespie et al. (2017) describe real-time analytics at the edge: identifying abnormal equipment behavior and filtering data near the edge for internet of things applications. A machine learning technique for anomaly detection uses the SAS® Event Stream Processing engine to analyze streaming sensor data and determine when performance of a turbofan engine deviates from normal operating conditions. Sensor readings from the engines are used to detect asset degradation and help with preventative maintenance applications. A single-class classification machine learning technique, called SVDD, is used to detect anomalies within the data. The technique shows how each engine degrades over its life cycle. This information can then be used in practice to provide alerts or trigger maintenance for the particular asset on an as-needed basis. Once the model was trained, the score code was deployed on to a thin client device running SAS® Event Stream Processing, to validate scoring the SVDD model on new observations and simulate how the SVDD model might perform in Internet of Things (IoT) edge applications.
IoT processing at the edge, or edge computing, pushes the analytics from a central server to devices close to where the data is generated. As such, edge computing moves the decision making capability of analytics from centralized nodes closer to the source of the data. This can be important for several reasons. It can help to reduce latency for applications where speed is critical. And it can also reduce data transmission and storage costs through the use of intelligent data filtering at the edge device. In Gillespie et al.'s case, sensors from a fleet of turbofan engines were evaluated to determine engine degradation and future failure. A scoring model was constructed to be able to do real-time detection of anomalies indicating degradation.
SVDD is a machine learning technique that can be used to do single-class classification. The model creates a minimum radius hypersphere around the training data used to build the model. The hypersphere is made flexible through the use of Kernel functions (Chaudhuri et al. 2016). As such, SVDD is able to provide a flexible data description on a wide variety of data sets. The methodology also does not require any assumptions regarding normality of the data, which can be a limitation with other anomaly detection techniques associated with multivariate statistical process control. If the data used to build the model represents normal conditions, then observations that lie outside of the hypersphere can represent possible anomalies. These might be anomalies that have previously occurred or new anomalies that would not have been found in historical data. Since the model is trained with data that is considered normal, the model can score any observation as abnormal even if it has not seen an abnormal example before.
To train the model, data from a small set of engines within the beginning of the time series that were assumed to be operating under normal conditions were sampled. The SVDD algorithm was constructed using a range of normal operating conditions for the equipment or system. For example, a haul truck within a mine might have very different sensor data readings when it is traveling on a flat road with no payload and when it is traveling up a hill with ore. However, both readings represent normal operating conditions for the piece of equipment. The model was trained using the svddTrain action from the svdd action set within SAS Visual Data Mining and Machine Learning. The ASTORE scoring code generated by the action was then saved to be used to score new observations using SAS Event Stream Processing on a gateway device. A Dell Wyse 3290 was set up with Wind River Linux and SAS Event Stream Processing (ESP). An ESP model was built to take the incoming observations, score them using the ASTORE code generated by the VDMML program and return a scored distance metric for each observation. This metric could then be used to monitor degradation and create a flag that could trigger an alert if above a specified threshold.
The results from Gillespie et al. revealed that each engine has a relatively stable normal operating state for the first portion of its useful life, followed by a sloped upward trend in the distance metric leading up to a failure point. This upward trend in the data indicated that the observations move further and further from the centroid of the normal hypersphere created by the SVDD model. As such, the engine operating conditions moved increasingly further from normal operating behavior. With increasing distance indicating potential degradation, an alert can be set to be triggered if the scored distance begins to rise above a pre-determined threshold or if the moving average of the scored distance deviates a certain percentage from the initial operating conditions of the asset. This can be tailored to the specific application that the model is used to monitor.
Brandsaeter et al. (2017) provide an on-line anomaly detection methodology applied in the maritime industry and propose modifications to an anomaly detection methodology based on signal reconstruction followed by residuals analysis. The reconstructions are made using Auto Associative Kernel Regression (AAKR), where the query observations are compared to historical observations called memory vectors representing normal operation. When the data set with historical observations grows large, the naive approach where all observations are used as memory vectors will lead to unacceptable large computational loads, hence a reduced set of memory vectors should be intelligently selected. The residuals between the observed and the reconstructed signals are analyzed using standard Sequential Probability Ratio Tests (SPRT), where appropriate alarms are raised based on the sequential behavior of the residuals. Brandsaeter et al. employ a cluster based method to select memory vectors to be considered by the AAKR, which reduces computation time; a generalization of the distance measure, which makes it possible to distinguish between explanatory and response variables; and a regional credibility estimation used in the residuals analysis, to let the time used to identify if a sequence of query vectors represents an anomalous state or not, depend on the amount of data situated close to or surrounding the query vector. The anomaly detection method was tested for analysis of operation of marine diesel engine in normal operation, and the data was manually modified to synthesize faults.
Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior (Chandola et al., 2009). In other words, anomalies can be defined as observations, or subset of observations, which are inconsistent with the reminder of the data set (Hodge and Austin, 2004; Barnett et al., 1994). Depending on the field of research and application, anomalies are also often referred to as outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants (Hodge and Austin, 2004; Chandola et al., 2009). Anomaly detection is related to, but distinct from noise removal (Chandola et al., 2009).
The fundamental approaches to the problem of anomaly detection can be divided into three categories (Hodge and Austin, 2004; Chandola et al., 2009):
Supervised anomaly detection. Availability of a training data set with labelled instances for normal and anomalous behavior is assumed. Typically, predictive models are built for normal and anomalous behavior, and unseen data are assigned to one of the classes.
Unsupervised anomaly detection. Here, the training data set is not labelled, and an implicit assumption is that the normal instances are far more frequent than anomalies in the test data. If this assumption is not true, then such techniques suffer from high false alarm rate.
Semi-supervised anomaly detection. In semi-supervised anomaly detection, the training data only includes normal data. A typical anomaly detection approach is to build a model for the class corresponding to normal behavior and use the model to identify anomalies in the test data. Since the semi-supervised and unsupervised methods do not require labels for the anomaly class, they are more widely applicable than supervised techniques.
Ahmad et al. (2017) discuss unsupervised real-time anomaly detection for streaming data. Streaming data inherently exhibits concept drift, favoring algorithms that learn continuously. Furthermore, the massive number of independent streams in practice requires that anomaly detectors be fully automated. Ahmad et al. propose an anomaly detection technique based on an online sequence memory algorithm called Hierarchical Temporal Memory (HTM). They define an anomaly as a point in time where the behavior of the system is unusual and significantly different from previous, normal behavior. An anomaly may signify a negative change in the system, like a fluctuation in the turbine rotation frequency of a jet engine, possibly indicating an imminent failure. An anomaly can also be positive, like an abnormally high number of web clicks on a new product page, implying stronger than normal demand. Either way, anomalies in data identify abnormal behavior with potentially useful information. Anomalies can be spatial, where an individual data instance can be considered anomalous with respect to the rest of data, independent of where it occurs in the data stream, or contextual, if the temporal sequence of data is relevant; i.e., a data instance is anomalous only in a specific temporal context, but not otherwise. Temporal anomalies are often subtle and hard to detect in real data streams. Detecting temporal anomalies in practical applications is valuable as they can serve as an early warning for problems with the underlying system.
Streaming applications impose unique constraints and challenges for machine learning models. These applications involve analyzing a continuous sequence of data occurring in real-time. In contrast to batch processing, the full dataset is not available. The system observes each data record in sequential order as it is collected, and any processing or learning must be done in an online fashion. At each point in time we would like to determine whether the behavior of the system is unusual. The determination is preferably made in real-time. That is, before seeing the next input, the algorithm must consider the current and previous states to decide whether the system behavior is anomalous, as well as perform any model updates and retraining. Unlike batch processing, data is not split into train/test sets, and algorithms cannot look ahead. Practical applications impose additional constraints on the problem. In many scenarios the statistics of the system can change over time, a problem known as concept drift.
Some anomaly detection algorithms are partially online. They either have an initial phase of offline learning or rely on look-ahead to flag previously-seen anomalous data. Most clustering-based approaches fall under the umbrella of such algorithms. Some examples include Distributed Matching-based Grouping Algorithm (DMGA), Online Novelty and Drift Detection Algorithm (OLINDDA), and Multi-class learNing Algorithm for data Streams (MINAS). Another example is self-adaptive and dynamic k-means that uses training data to learn weights prior to anomaly detection. Kernel-based recursive least squares (KRLS) also violates the principle of no look-ahead as it resolves temporarily flagged data instances a few time steps later to decide if they were anomalous. However, some kernel methods, such as EXPoSE, adhere to our criteria of real-time anomaly detection.
For streaming anomaly detection, the majority of methods used in practice are statistical techniques that are computationally lightweight. These techniques include sliding thresholds, outlier tests such as extreme studentized deviate (ESD, also known as Grubbs') and k-sigma, changepoint detection, statistical hypotheses testing, and exponential smoothing such as Holt-Winters. Typicality and eccentricity analysis is an efficient technique that requires no user-defined parameters. Most of these techniques focus on spatial anomalies, limiting their usefulness in applications with temporal dependencies.
More advanced time-series modeling and forecasting models are capable of detecting temporal anomalies in complex scenarios. ARIMA is a general purpose technique for modeling temporal data with seasonality. It is effective at detecting anomalies in data with regular daily or weekly patterns. Extensions of ARIMA enable the automatic determination of seasonality for certain applications. A more recent example capable of handling temporal anomalies is based on relative entropy. Model-based approaches have been developed for specific use cases, but require explicit domain knowledge and are not generalizable. Domain-specific examples include anomaly detection in aircraft engine measurements, cloud datacenter temperatures, and ATM fraud detection. Kalman filtering is a common technique, but the parameter tuning often requires domain knowledge and choosing specific residual error models. Model-based approaches are often computationally efficient but their lack of generalizability limits their applicability to general streaming applications.
There are a number of other restrictions that can make methods unsuitable for real-time streaming anomaly detection, such as computational constraints that impede scalability. An example is Lytics Anomalyzer, which runs in O(n2), limiting its usefulness in practice where streams are arbitrarily long. Dimensionality is another factor that can make some methods restrictive. For instance, online variants of principle component analysis (PCA) such as osPCA or window-based PCA can only work with high-dimensional, multivariate data streams that can be projected onto a low dimensional space. Techniques that require data labels, such as supervised classification-based methods, are typically unsuitable for real-time anomaly detection and continuous learning.
Ahmad et al. (2017) show how to use Hierarchical Temporal Memory (HTM) networks to detect anomalies on a variety of data streams. The resulting system is efficient, extremely tolerant to noisy data, continuously adapts to changes in the statistics of the data, and detects subtle temporal anomalies while minimizing false positives. Based on known properties of cortical neurons, HTM is a theoretical framework for sequence learning in the cortex. HTM implementations operate in real-time and have been shown to work well for prediction tasks. HTM networks continuously learn and model the spatiotemporal characteristics of their inputs, but they do not directly model anomalies and do not output a usable anomaly score. Rather than thresholding the prediction error directly, Ahmad et al. model the distribution of error values as an indirect metric and use this distribution to check for the likelihood that the current state is anomalous. The anomaly likelihood is thus a probabilistic metric defining how anomalous the current state is based on the prediction history of the HTM model. To compute the anomaly likelihood a window of the last W error values is maintained, and the distribution modelled as a rolling normal distribution where the sample mean, μt, and variance, σ2, are continuously updated from previous error values. Then, a recent short-term average of prediction errors is computed, and a threshold applied to the Gaussian tail probability (Q-function) to decide whether or not to declare an anomaly. Since thresholding involves thresholding a tail probability, there is an inherent upper limit on the number of alerts and a corresponding upper bound on the number of false positives. The anomaly likelihood is based on the distribution of prediction errors, not on the distribution of underlying metric values. As such, it is a measure of how well the model is able to predict, relative to the recent history.
In clean, predictable scenarios, the anomaly likelihood of the HTM anomaly detection network behaves similarly to the prediction error. In these cases, the distribution of errors will have very small variance and will be centered near 0. Any spike in the prediction error will similarly lead to a corresponding spike in likelihood of anomaly. However, in scenarios with some inherent randomness or noise, the variance will be wider and the mean further from 0. A single spike in the prediction error will not lead to a significant increase in anomaly likelihood but a series of spikes will. A scenario that goes from wildly random to completely predictable will also trigger an anomaly.
In some embodiments, the present technology provides systems and methods for capturing a stream of data relating to performance of a physical system, processing the stream with respect to a statistical model generated using machine learning, and predicting the presence of an anomaly representing impending or actual hardware deviation from a normal state, distinguished from the hardware in a normal state, in a rigorous environment of use.
It is often necessary to decide which one of a finite set of possible Gaussian processes is being observed. For example, it may be important to decide whether a normal state of operation is being observed with its range of statistical variations, or an aberrant state of operation is being observed, which may assume not only a different nominal operating point, but also a statistical variance that is quantitatively different from the normal state. Indeed, the normal and aberrational states may differ only in the differences in statistical profile, with all nominal values having, or controlled to maintain, a nominal value. The ability to make such decisions can depend on the distances in n-dimensional space between the Gaussian processes where n is the number of features that describe the processes; if the processes are close (similar) to each other, the decision can be difficult. The distances may be measured using a divergence, the Bhattacharyya distance, or the Mahalanobis distance, for example. In addition, these distances can be described as or converted to vectors in n-dimensional space by determining angles from the corresponding axis (e.g. the n Mahalanobis angles between the vectors of Mahalanobis distances, spanning from the origin to multi-dimensional standardized error points, and the corresponding axis of standardized errors). Some or all of these distances and angles can be used to evaluate whether a system is in a normal or aberrant state of operation and can also be used as input to models that classify an aberrant state of operation as a particular kind of engine failure in accordance with some embodiments of the presently disclosed technology.
In many cases, engine parameter(s) being monitored and analyzed for anomaly detection are assumed to be correlated with some other engine parameter(s) being monitored. For example, if y is the engine sensor value being analyzed for near real-time predictions and x1, x2, . . . are other engine sensors also being monitored, there exists a function ƒ1 such that y=ƒ1(x1, x2, . . . , xn) where y is the dependent variable and x1, x2, . . . , xn, etc., are independent variables and y is a function of x1, x2, . . . , xn or f1:nl.
In some embodiments, the machine being analyzed is a diesel engine within a marine vessel, and the analysis system's goal is to identify diesel engine operational anomalies and/or diesel engine sensor anomalies at near real-time latency, using an edge device installed at or near the engine. Of course, other types of vehicles, engines, or machines may similarly be subject to the monitoring and analysis.
The edge device may interface with the engine's electronic control module/unit (ECM/ECU) and collects engine sensors data as a time series (e.g., engine revolutions per minute (RPM), load percent, coolant temperature, coolant pressure, oil temperature, oil pressure, fuel pressure, fuel actuator percentage, etc.) as well as speed and location data from an internal GPS/DGPS or vessel's GPS/DGPS.
The edge device may, for example, collect all of these sensor data at an approximate rate of sixty samples per minute, and align the data to every second's timestamp (e.g. 12:00:00, 12:00:01, 12:00:02, . . . ). If data can be recorded at higher frequency, an aggregate (e.g., an average value) may be calculated for each second or other appropriate period. Then the average value (i.e., arithmetical mean) for each minute may then be calculated, creating the minute's averaged time series (e.g., 12:00:00, 12:01:00, 12:02:00, . . . ).
In some embodiments, minute's average data were found to be more stable for developing statistical models and predicting anomalies than raw, high-frequency samples. However, in some cases, the inter-sample noise can be processed with subsequent stages of the algorithm.
The edge device collects an n-dimensional engine data time series that may include, but is not limited to, timestamps (ts) and the following engine parameters: engine speed (rpm), engine load percentage (load), coolant temperature (coolant temperature), coolant pressure (coolant pressure), oil temperature (oil temperature), oil pressure (oil pressure), fuel pressure (fuel pressure), and fuel actuator percentage (fuel actuator percentage).
In some cases, ambient temperature, barometric pressure, humidity, location, maintenance information, or other data are collected.
In a variance analysis of diesel engine data, most of the engine parameters, including coolant temperature, are found to have strong correlation with engine RPM and engine load percentage in a bounded range of engine speed and when engine is in steady state, i.e., RPM and engine load is stable. So, inside that bounded region of engine RPM (e.g., higher than idle engine RPM), there exists a function ƒ1 such that:
coolant temperature=ƒ1(rpm, load)
f1: nm.
In this case n equals two (rpm and load) and m equals one (coolant temperature).
In other words, ƒ1 is a map that allows for prediction of a single dependent variable from two independent variables. Similarly,
Grouping these maps into one map leads to a multi-dimensional map (i.e. the model) such that ƒ: nm where n equals two (rpm, load) and m equals six (coolant temperature, coolant pressure, oil temperature, oil pressure, fuel pressure and fuel actuator percentage) in this case. Critically, many maps are grouped into a single map with the same input variables, enabling potentially many correlated variables (i.e., a tensor of variables) to be predicted within a bounded range. Note that the specific independent variables need not be engine RPM and engine load and need not be limited to two variables. For example, engine operating hours could be added as an independent variable in the map to account for engine degradation with operating time.
In order to create an engine model, a training time period is selected in which the engine had no apparent operational issues. In some embodiments, a machine learning algorithm is used to generate the engine models directly on the edge device, in a local or remote server, or in the cloud. A modeling technique can be selected that offers low model bias (e.g. spline, neural network or support vector machines (SVM), and/or a Generalized Additive Model (GAM)). See:
U.S. Pat. Nos. 1,006,1887; 10,126,309; 10,154,624; 10,168,337; 10,187,899; 6,006,182; 6,064,960; 6,366,884; 6,401,070; 6,553,344; 6,785,652; 7,039,654; 7,144,869; 7,379,890; 7,389,114; 7,401,057; 7,426,499; 7,547,683; 7,561,972; 7,561,973; 7,583,961; 7,653,491; 7,693,683; 7,698,213; 7,702,576; 7,729,864; 7,730,063; 7,774,272; 7,813,981; 7,873,567; 7,873,634; 7,970,640; 8,005,620; 8,126,653; 8,152,750; 8,185,486; 8,401,798; 8,412,461; 8,498,915; 8,515,719; 8,566,070; 8,635,029; 8,694,455; 8,713,025; 8,724,866; 8,731,728; 8,843,356; 8,929,568; 8,992,453; 9,020,866; 9,037,256; 9,075,796; 9,092,391; 9,103,826; 9,204,319; 9,205,064; 9,297,814; 9,428,767; 9,471,884; 9,483,531; 9,534,234; 9,574,209; 9,580,697; 9,619,883; 9,886,545; 9,900,790; 9,903,193; 9,955,488; 9,992,123; 20010009904; 20010034686; 20020001574; 20020138012; 20020138270; 20030023951; 20030093277; 20040073414; 20040088239; 20040110697; 20040172319; 20040199445; 20040210509; 20040215551; 20040225629; 20050071266; 20050075597; 20050096963; 20050144106; 20050176442; 20050245252; 20050246314; 20050251468; 20060059028; 20060101017; 20060111849; 20060122816; 20060136184; 20060184473; 20060189553; 20060241869; 20070038386; 20070043656; 20070067195; 20070105804; 20070166707; 20070185656; 20070233679; 20080015871; 20080027769; 20080027841; 20080050357; 20080114564; 20080140549; 20080228744; 20080256069; 20080306804; 20080313073; 20080319897; 20090018891; 20090030771; 20090037402; 20090037410; 20090043637; 20090050492; 20090070182; 20090132448; 20090171740; 20090220965; 20090271342; 20090313041; 20100028870; 20100029493; 20100042438; 20100070455; 20100082617; 20100100331; 20100114793; 20100293130; 20110054949; 20110059860; 20110064747; 20110075920; 20110111419; 20110123986; 20110123987; 20110166844; 20110230366; 20110276828; 20110287946; 20120010867; 20120066217; 20120136629; 20120150032; 20120158633; 20120207771; 20120220958; 20120230515; 20120258874; 20120283885; 20120284207; 20120290505; 20120303408; 20120303504; 20130004473; 20130030584; 20130054486; 20130060305; 20130073442; 20130096892; 20130103570; 20130132163; 20130183664; 20130185226; 20130259847; 20130266557; 20130315885; 20140006013; 20140032186; 20140100128; 20140172444; 20140193919; 20140278967; 20140343959; 20150023949; 20150235143; 20150240305; 20150289149; 20150291975; 20150291976; 20150291977; 20150316562; 20150317449; 20150324548; 20150347922; 20160003845; 20160042513; 20160117327; 20160145693; 20160148237; 20160171398; 20160196587; 20160225073; 20160225074; 20160239919; 20160282941; 20160333328; 20160340691; 20170046347; 20170126009; 20170132537; 20170137879; 20170191134; 20170244777; 20170286594; 20170290024; 20170306745; 20170308672; 20170308846; 20180006957; 20180017564; 20180018683; 20180035605; 20180046926; 20180060458; 20180060738; 20180060744; 20180120133; 20180122020; 20180189564; 20180227930; 20180260515; 20180260717; 20180262433; 20180263606; 20180275146; 20180282736; 20180293511; 20180334721; 20180341958; 20180349514; 20190010554; and 20190024497.
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. Generalized linear models unify various other statistical models, including linear regression, logistic regression and Poisson regression, and employs an iteratively reweighted least squares method for maximum likelihood estimation of the model parameters. See:
U.S. Pat. No. 1,000,2367; 10,006,088; 10,009,366; 10,013,701; 10,013,721; 10,018,631; 10,019,727; 10,021,426; 10,023,877; 10,036,074; 10,036,638; 10,037,393; 10,038,697; 10,047,358; 10,058,519; 10,062,121; 10,070,166; 10,070,220; 10,071,151; 10,080,774; 10,092,509; 10,098,569; 10,098,908; 10,100,092; 10,101,340; 10,111,888; 10,113,198; 10,113,200; 10,114,915; 10,117,868; 10,131,949; 10,142,788; 10,147,173; 10,157,509; 10,172,363; 10,175,387; 10,181,010; 5,529,901; 5,641,689; 5,667,541; 5,770,606; 5,915,036; 5,985,889; 6,043,037; 6,121,276; 6,132,974; 6,140,057; 6,200,983; 6,226,393; 6,306,437; 6,411,729; 6,444,870; 6,519,599; 6,566,368; 6,633,857; 6,662,185; 6,684,252; 6,703,231; 6,704,718; 6,879,944; 6,895,083; 6,939,670; 7,020,578; 7,043,287; 7,069,258; 7,117,185; 7,179,797; 7,208,517; 7,228,171; 7,238,799; 7,268,137; 7,306,913; 7,309,598; 7,337,033; 7,346,507; 7,445,896; 7,473,687; 7,482,117; 7,494,783; 7,516,572; 7,550,504; 7,590,516; 7,592,507; 7,593,815; 7,625,699; 7,651,840; 7,662,564; 7,685,084; 7,693,683; 7,695,911; 7,695,916; 7,700,074; 7,702,482; 7,709,460; 7,711,488; 7,727,725; 7,743,009; 7,747,392; 7,751,984; 7,781,168; 7,799,530; 7,807,138; 7,811,794; 7,816,083; 7,820,380; 7,829,282; 7,833,706; 7,840,408; 7,853,456; 7,863,021; 7,888,016; 7,888,461; 7,888,486; 7,890,403; 7,893,041; 7,904,135; 7,910,107; 7,910,303; 7,913,556; 7,915,244; 7,921,069; 7,933,741; 7,947,451; 7,953,676; 7,977,052; 7,987,148; 7,993,833; 7,996,342; 8,010,476; 8,017,317; 8,024,125; 8,027,947; 8,037,043; 8,039,212; 8,071,291; 8,071,302; 8,094,713; 8,103,537; 8,135,548; 8,148,070; 8,153,366; 8,211,638; 8,214,315; 8,216,786; 8,217,078; 8,222,270; 8,227,189; 8,234,150; 8,234,151; 8,236,816; 8,283,440; 8,291,069; 8,299,109; 8,311,849; 8,328,950; 8,346,688; 8,349,327; 8,351,688; 8,364,627; 8,372,625; 8,374,837; 8,383,338; 8,412,465; 8,415,093; 8,434,356; 8,452,621; 8,452,638; 8,455,468; 8,461,849; 8,463,582; 8,465,980; 8,473,249; 8,476,077; 8,489,499; 8,496,934; 8,497,084; 8,501,718; 8,501,719; 8,514,928; 8,515,719; 8,521,294; 8,527,352; 8,530,831; 8,543,428; 8,563,295; 8,566,070; 8,568,995; 8,569,574; 8,600,870; 8,614,060; 8,618,164; 8,626,697; 8,639,618; 8,645,298; 8,647,819; 8,652,776; 8,669,063; 8,682,812; 8,682,876; 8,706,589; 8,712,937; 8,715,704; 8,715,943; 8,718,958; 8,725,456; 8,725,541; 8,731,977; 8,732,534; 8,741,635; 8,741,956; 8,754,805; 8,769,094; 8,787,638; 8,799,202; 8,805,619; 8,811,670; 8,812,362; 8,822,149; 8,824,762; 8,871,901; 8,877,174; 8,889,662; 8,892,409; 8,903,192; 8,903,531; 8,911,958; 8,912,512; 8,956,608; 8,962,680; 8,965,625; 8,975,022; 8,977,421; 8,987,686; 9,011,877; 9,030,565; 9,034,401; 9,036,910; 9,037,256; 9,040,023; 9,053,537; 9,056,115; 9,061,004; 9,061,055; 9,069,352; 9,072,496; 9,074,257; 9,080,212; 9,106,718; 9,116,722; 9,128,991; 9,132,110; 9,186,107; 9,200,324; 9,205,092; 9,207,247; 9,208,209; 9,210,446; 9,211,103; 9,216,010; 9,216,213; 9,226,518; 9,232,217; 9,243,493; 9,275,353; 9,292,550; 9,361,274; 9,370,501; 9,370,509; 9,371,565; 9,374,671; 9,375,412; 9,375,436; 9,389,235; 9,394,345; 9,399,061; 9,402,871; 9,415,029; 9,451,920; 9,468,541; 9,503,467; 9,534,258; 9,536,214; 9,539,223; 9,542,939; 9,555,069; 9,555,251; 9,563,921; 9,579,337; 9,585,868; 9,615,585; 9,625,646; 9,633,401; 9,639,807; 9,639,902; 9,650,678; 9,663,824; 9,668,104; 9,672,474; 9,674,210; 9,675,642; 9,679,378; 9,681,835; 9,683,832; 9,701,721; 9,710,767; 9,717,459; 9,727,616; 9,729,568; 9,734,122; 9,734,290; 9,740,979; 9,746,479; 9,757,388; 9,758,828; 9,760,907; 9,769,619; 9,775,818; 9,777,327; 9,786,012; 9,790,256; 9,791,460; 9,792,741; 9,795,335; 9,801,857; 9,801,920; 9,809,854; 9,811,794; 9,836,577; 9,870,519; 9,871,927; 9,881,339; 9,882,660; 9,886,771; 9,892,420; 9,926,368; 9,926,593; 9,932,637; 9,934,239; 9,938,576; 9,949,659; 9,949,693; 9,951,348; 9,955,190; 9,959,285; 9,961,488; 9,967,714; 9,972,014; 9,974,773; 9,976,182; 9,982,301; 9,983,216; 9,986,527; 9,988,624; 9,990,648; 9,990,649; 9,993,735; 20020016699; 20020055457; 20020099686; 20020184272; 20030009295; 20030021848; 20030023951; 20030050265; 20030073715; 20030078738; 20030104499; 20030139963; 20030166017; 20030166026; 20030170660; 20030170700; 20030171685; 20030171876; 20030180764; 20030190602; 20030198650; 20030199685; 20030220775; 20040063095; 20040063655; 20040073414; 20040092493; 20040115688; 20040116409; 20040116434; 20040127799; 20040138826; 20040142890; 20040157783; 20040166519; 20040265849; 20050002950; 20050026169; 20050080613; 20050096360; 20050113306; 20050113307; 20050164206; 20050171923; 20050272054; 20050282201; 20050287559; 20060024700; 20060035867; 20060036497; 20060084070; 20060084081; 20060142983; 20060143071; 20060147420; 20060149522; 20060164997; 20060223093; 20060228715; 20060234262; 20060278241; 20060286571; 20060292547; 20070026426; 20070031846; 20070031847; 20070031848; 20070036773; 20070037208; 20070037241; 20070042382; 20070049644; 20070054278; 20070059710; 20070065843; 20070072821; 20070078117; 20070078434; 20070087000; 20070088248; 20070123487; 20070129948; 20070167727; 20070190056; 20070202518; 20070208600; 20070208640; 20070239439; 20070254289; 20070254369; 20070255113; 20070259954; 20070275881; 20080032628; 20080033589; 20080038230; 20080050732; 20080050733; 20080051318; 20080057500; 20080059072; 20080076120; 20080103892; 20080108081; 20080108713; 20080114564; 20080127545; 20080139402; 20080160046; 20080166348; 20080172205; 20080176266; 20080177592; 20080183394; 20080195596; 20080213745; 20080241846; 20080248476; 20080286796; 20080299554; 20080301077; 20080305967; 20080306034; 20080311572; 20080318219; 20080318914; 20090006363; 20090035768; 20090035769; 20090035772; 20090053745; 20090055139; 20090070081; 20090076890; 20090087909; 20090089022; 20090104620; 20090107510; 20090112752; 20090118217; 20090119357; 20090123441; 20090125466; 20090125916; 20090130682; 20090131702; 20090132453; 20090136481; 20090137417; 20090157409; 20090162346; 20090162348; 20090170111; 20090175830; 20090176235; 20090176857; 20090181384; 20090186352; 20090196875; 20090210363; 20090221438; 20090221620; 20090226420; 20090233299; 20090253952; 20090258003; 20090264453; 20090270332; 20090276189; 20090280566; 20090285827; 20090298082; 20090306950; 20090308600; 20090312410; 20090325920; 20100003691; 20100008934; 20100010336; 20100035983; 20100047798; 20100048525; 20100048679; 20100063851; 20100076949; 20100113407; 20100120040; 20100132058; 20100136553; 20100136579; 20100137409; 20100151468; 20100174336; 20100183574; 20100183610; 20100184040; 20100190172; 20100191216; 20100196400; 20100197033; 20100203507; 20100203508; 20100215645; 20100216154; 20100216655; 20100217648; 20100222225; 20100249188; 20100261187; 20100268680; 20100272713; 20100278796; 20100284989; 20100285579; 20100310499; 20100310543; 20100330187; 20110004509; 20110021555; 20110027275; 20110028333; 20110054356; 20110065981; 20110070587; 20110071033; 20110077194; 20110077215; 20110077931; 20110079077; 20110086349; 20110086371; 20110086796; 20110091994; 20110093288; 20110104121; 20110106736; 20110118539; 20110123100; 20110124119; 20110129831; 20110130303; 20110131160; 20110135637; 20110136260; 20110137851; 20110150323; 20110173116; 20110189648; 20110207659; 20110207708; 20110208738; 20110213746; 20110224181; 20110225037; 20110251272; 20110251995; 20110257216; 20110257217; 20110257218; 20110257219; 20110263633; 20110263634; 20110263635; 20110263636; 20110263637; 20110269735; 20110276828; 20110284029; 20110293626; 20110302823; 20110307303; 20110311565; 20110319811; 20120003212; 20120010274; 20120016106; 20120016436; 20120030082; 20120039864; 20120046263; 20120064512; 20120065758; 20120071357; 20120072781; 20120082678; 20120093376; 20120101965; 20120107370; 20120108651; 20120114211; 20120114620; 20120121618; 20120128223; 20120128702; 20120136629; 20120154149; 20120156215; 20120163656; 20120165221; 20120166291; 20120173200; 20120184605; 20120209565; 20120209697; 20120220055; 20120239489; 20120244145; 20120245133; 20120250963; 20120252050; 20120252695; 20120257164; 20120258884; 20120264692; 20120265978; 20120269846; 20120276528; 20120280146; 20120301407; 20120310619; 20120315655; 20120316833; 20120330720; 20130012860; 20130024124; 20130024269; 20130029327; 20130029384; 20130030051; 20130040922; 20130040923; 20130041034; 20130045198; 20130045958; 20130058914; 20130059827; 20130059915; 20130060305; 20130060549; 20130061339; 20130065870; 20130071033; 20130073213; 20130078627; 20130080101; 20130081158; 20130102918; 20130103615; 20130109583; 20130112895; 20130118532; 20130129764; 20130130923; 20130138481; 20130143215; 20130149290; 20130151429; 20130156767; 20130171296; 20130197081; 20130197738; 20130197830; 20130198203; 20130204664; 20130204833; 20130209486; 20130210855; 20130211229; 20130212168; 20130216551; 20130225439; 20130237438; 20130237447; 20130240722; 20130244233; 20130244902; 20130244965; 20130252267; 20130252822; 20130262425; 20130271668; 20130273103; 20130274195; 20130280241; 20130288913; 20130303558; 20130303939; 20130310261; 20130315894; 20130325498; 20130332231; 20130332338; 20130346023; 20130346039; 20130346844; 20140004075; 20140004510; 20140011206; 20140011787; 20140038930; 20140058528; 20140072550; 20140072957; 20140080784; 20140081675; 20140086920; 20140087960; 20140088406; 20140093127; 20140093974; 20140095251; 20140100989; 20140106370; 20140107850; 20140114746; 20140114880; 20140120137; 20140120533; 20140127213; 20140128362; 20140134186; 20140134625; 20140135225; 20140141988; 20140142861; 20140143134; 20140148505; 20140156231; 20140156571; 20140163096; 20140170069; 20140171337; 20140171382; 20140172507; 20140178348; 20140186333; 20140188918; 20140199290; 20140200953; 20140200999; 20140213533; 20140219968; 20140221484; 20140234291; 20140234347; 20140235605; 20140236965; 20140242180; 20140244216; 20140249447; 20140249862; 20140256576; 20140258355; 20140267700; 20140271672; 20140274885; 20140278148; 20140279053; 20140279306; 20140286935; 20140294903; 20140303481; 20140316217; 20140323897; 20140324521; 20140336965; 20140343786; 20140349984; 20140365144; 20140365276; 20140376645; 20140378334; 20150001420; 20150002845; 20150004641; 20150005176; 20150006605; 20150007181; 20150018632; 20150019262; 20150025328; 20150031578; 20150031969; 20150032598; 20150032675; 20150039265; 20150051896; 20150051949; 20150056212; 20150064194; 20150064195; 20150064670; 20150066738; 20150072434; 20150072879; 20150073306; 20150078460; 20150088783; 20150089399; 20150100407; 20150100408; 20150100409; 20150100410; 20150100411; 20150100412; 20150111775; 20150112874; 20150119759; 20150120758; 20150142331; 20150152176; 20150167062; 20150169840; 20150178756; 20150190367; 20150190436; 20150191787; 20150205756; 20150209586; 20150213192; 20150215127; 20150216164; 20150216922; 20150220487; 20150228031; 20150228076; 20150231191; 20150232944; 20150240304; 20150240314; 20150250816; 20150259744; 20150262511; 20150272464; 20150287143; 20150292010; 20150292016; 20150299798; 20150302529; 20150306160; 20150307614; 20150320707; 20150320708; 20150328174; 20150332013; 20150337373; 20150341379; 20150348095; 20150356458; 20150359781; 20150361494; 20150366830; 20150377909; 20150378807; 20150379428; 20150379429; 20150379430; 20160010162; 20160012334; 20160017037; 20160017426; 20160024575; 20160029643; 20160029945; 20160032388; 20160034640; 20160034664; 20160038538; 20160040184; 20160040236; 20160042009; 20160042197; 20160045466; 20160046991; 20160048925; 20160053322; 20160058717; 20160063144; 20160068890; 20160068916; 20160075665; 20160078361; 20160097082; 20160105801; 20160108473; 20160108476; 20160110657; 20160110812; 20160122396; 20160124933; 20160125292; 20160138105; 20160139122; 20160147013; 20160152538; 20160163132; 20160168639; 20160171618; 20160171619; 20160173122; 20160175321; 20160198657; 20160202239; 20160203279; 20160203316; 20160222100; 20160222450; 20160224724; 20160224869; 20160228056; 20160228392; 20160237487; 20160243190; 20160243215; 20160244836; 20160244837; 20160244840; 20160249152; 20160250228; 20160251720; 20160253324; 20160253330; 20160259883; 20160265055; 20160271144; 20160281105; 20160281164; 20160282941; 20160295371; 20160303111; 20160303172; 20160306075; 20160307138; 20160310442; 20160319352; 20160344738; 20160352768; 20160355886; 20160359683; 20160371782; 20160378942; 20170004409; 20170006135; 20170007574; 20170009295; 20170014032; 20170014108; 20170016896; 20170017904; 20170022563; 20170022564; 20170027940; 20170028006; 20170029888; 20170029889; 20170032100; 20170035011; 20170037470; 20170046499; 20170051019; 20170051359; 20170052945; 20170056468; 20170061073; 20170067121; 20170068795; 20170071884; 20170073756; 20170074878; 20170076303; 20170088900; 20170091673; 20170097347; 20170098240; 20170098257; 20170098278; 20170099836; 20170100446; 20170103190; 20170107583; 20170108502; 20170112792; 20170116624; 20170116653; 20170117064; 20170119662; 20170124520; 20170124528; 20170127110; 20170127180; 20170135647; 20170140122; 20170140424; 20170145503; 20170151217; 20170156344; 20170157249; 20170159045; 20170159138; 20170168070; 20170177813; 20170180798; 20170193647; 20170196481; 20170199845; 20170214799; 20170219451; 20170224268; 20170226164; 20170228810; 20170231221; 20170233809; 20170233815; 20170235894; 20170236060; 20170238850; 20170238879; 20170242972; 20170246963; 20170247673; 20170255888; 20170255945; 20170259178; 20170261645; 20170262580; 20170265044; 20170268066; 20170270580; 20170280717; 20170281747; 20170286594; 20170286608; 20170286838; 20170292159; 20170298126; 20170300814; 20170300824; 20170301017; 20170304248; 20170310697; 20170311895; 20170312289; 20170312315; 20170316150; 20170322928; 20170344554; 20170344555; 20170344556; 20170344954; 20170347242; 20170350705; 20170351689; 20170351806; 20170351811; 20170353825; 20170353826; 20170353827; 20170353941; 20170363738; 20170364596; 20170364817; 20170369534; 20170374521; 20180000102; 20180003722; 20180005149; 20180010136; 20180010185; 20180010197; 20180010198; 20180011110; 20180014771; 20180017545; 20180017564; 20180017570; 20180020951; 20180021279; 20180031589; 20180032876; 20180032938; 20180033088; 20180038994; 20180049636; 20180051344; 20180060513; 20180062941; 20180064666; 20180067010; 20180067118; 20180071285; 20180075357; 20180077146; 20180078605; 20180080081; 20180085168; 20180085355; 20180087098; 20180089389; 20180093418; 20180093419; 20180094317; 20180095450; 20180108431; 20180111051; 20180114128; 20180116987; 20180120133; 20180122020; 20180128824; 20180132725; 20180143986; 20180148776; 20180157758; 20180160982; 20180171407; 20180182181; 20180185519; 20180191867; 20180192936; 20180193652; 20180201948; 20180206489; 20180207248; 20180214404; 20180216099; 20180216100; 20180216101; 20180216132; 20180216197; 20180217141; 20180217143; 20180218117; 20180225585; 20180232421; 20180232434; 20180232661; 20180232700; 20180232702; 20180232904; 20180235549; 20180236027; 20180237825; 20180239829; 20180240535; 20180245154; 20180251819; 20180251842; 20180254041; 20180260717; 20180263962; 20180275629; 20180276325; 20180276497; 20180276498; 20180276570; 20180277146; 20180277250; 20180285765; 20180285900; 20180291398; 20180291459; 20180291474; 20180292384; 20180292412; 20180293462; 20180293501; 20180293759; 20180300333; 20180300639; 20180303354; 20180303906; 20180305762; 20180312923; 20180312926; 20180314964; 20180315507; 20180322203; 20180323882; 20180327740; 20180327806; 20180327844; 20180336534; 20180340231; 20180344841; 20180353138; 20180357361; 20180357362; 20180357529; 20180357565; 20180357726; 20180358118; 20180358125; 20180358128; 20180358132; 20180359608; 20180360892; 20180365521; 20180369238; 20180369696; 20180371553; 20190000750; 20190001219; 20190004996; 20190005586; 20190010548; 20190015035; 20190017117; 20190017123; 20190024174; 20190032136; 20190033078; 20190034473; 20190034474; 20190036779; 20190036780; and 20190036816.
Ordinary linear regression predicts the expected value of a given unknown quantity (the response variable, a random variable) as a linear combination of a set of observed values (predictors). This implies that a constant change in a predictor leads to a constant change in the response variable (i.e. a linear-response model). This is appropriate when the response variable has a normal distribution (intuitively, when a response variable can vary essentially indefinitely in either direction with no fixed “zero value”, or more generally for any quantity that only varies by a relatively small amount, e.g. human heights). However, these assumptions can be inappropriate for some types of response variables. For example, in cases where the response variable is expected to be always positive and varying over a wide range, constant input changes lead to geometrically varying, rather than constantly varying, output changes.
In a GLM, each outcome Y of the dependent variables is assumed to be generated from a particular distribution in the exponential family, a large range of probability distributions that includes the normal, binomial, Poisson and gamma distributions, among others.
The GLM consists of three elements: A probability distribution from the exponential family; a linear predictor η=Xβ; and a link function g such that E(Y)=μ=g−1(η). The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (Greek “eta”) denotes a linear predictor. It is related to the expected value of the data through the link function. η is expressed as linear combinations (thus, “linear”) of unknown parameters β. The coefficients of the linear combination are represented as the matrix of independent variables X. η can thus be expressed as the link function and provides the relationship between the linear predictor and the mean of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined canonical link function which is derived from the exponential of the response's density function. However, in some cases it makes sense to try to match the domain of the link function to the range of the distribution function's mean or use a non-canonical link function for algorithmic purposes, for example Bayesian probit regression. For the most common distributions, the mean is one of the parameters in the standard form of the distribution's density function, and then is the function as defined above that maps the density function into its canonical form. A simple, important example of a generalized linear model (also an example of a general linear model) is linear regression. In linear regression, the use of the least-squares estimator is justified by the Gauss-Markov theorem, which does not assume that the distribution is normal.
The standard GLM assumes that the observations are uncorrelated. Extensions have been developed to allow for correlation between observations, as occurs for example in longitudinal studies and clustered designs. Generalized estimating equations (GEEs) allow for the correlation between observations without the use of an explicit probability model for the origin of the correlations, so there is no explicit likelihood. They are suitable when the random effects and their variances are not of inherent interest, as they allow for the correlation without explaining its origin. The focus is on estimating the average response over the population (“population-averaged” effects) rather than the regression parameters that would enable prediction of the effect of changing one or more components of X on a given individual. GEEs are usually used in conjunction with Huber-White standard errors. Generalized linear mixed models (GLMMs) are an extension to GLMs that includes random effects in the linear predictor, giving an explicit probability model that explains the origin of the correlations. The resulting “subject-specific” parameter estimates are suitable when the focus is on estimating the effect of changing one or more components of X on a given individual. GLMMs are also referred to as multilevel models and as mixed model. In general, fitting GLMMs is more computationally complex and intensive than fitting GEEs.
In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani to blend properties of generalized linear models with additive models.
The model relates a univariate response variable, to some predictor variables. An exponential family distribution is specified for (for example normal, binomial or Poisson distributions) along with a link function g (for example the identity or log functions) relating the expected value of univariate response variable to the predictor variables.
The functions may have a specified parametric form (for example a polynomial, or an un-penalized regression spline of a variable) or may be specified non-parametrically, or semi-parametrically, simply as ‘smooth functions’, to be estimated by non-parametric means. A typical GAM might use a scatterplot smoothing function, such as a locally weighted mean. This flexibility to allow non-parametric fits with relaxed assumptions on the actual relationship between response and predictor, provides the potential for better fits to data than purely parametric models, but arguably with some loss of interpretability.
Any multivariate function can be represented as sums and compositions of univariate functions. Unfortunately, though the Kolmogorov-Arnold representation theorem asserts the existence of a function of this form, it gives no mechanism whereby one could be constructed. Certain constructive proofs exist, but they tend to require highly complicated (i.e., fractal) functions, and thus are not suitable for modeling approaches. It is not clear that any step-wise (i.e. backfitting algorithm) approach could even approximate a solution. Therefore, the Generalized Additive Model drops the outer sum, and demands instead that the function belong to a simpler class.
The original GAM fitting method estimated the smooth components of the model using non-parametric smoothers (for example smoothing splines or local linear regression smoothers) via the backfitting algorithm. Backfitting works by iterative smoothing of partial residuals and provides a very general modular estimation method capable of using a wide variety of smoothing methods to estimate the terms. Many modern implementations of GAMs and their extensions are built around the reduced rank smoothing approach, because it allows well founded estimation of the smoothness of the component smooths at comparatively modest computational cost, and also facilitates implementation of a number of model extensions in a way that is more difficult with other methods. At its simplest the idea is to replace the unknown smooth functions in the model with basis expansions. Smoothing bias complicates interval estimation for these models, and the simplest approach turns out to involve a Bayesian approach. Understanding this Bayesian view of smoothing also helps to understand the REML and full Bayes approaches to smoothing parameter estimation. At some level smoothing penalties are imposed.
Overfitting can be a problem with GAMs, especially if there is un-modelled residual auto-correlation or un-modelled overdispersion. Cross-validation can be used to detect and/or reduce overfitting problems with GAMs (or other statistical methods), and software often allows the level of penalization to be increased to force smoother fits. Estimating very large numbers of smoothing parameters is also likely to be statistically challenging, and there are known tendencies for prediction error criteria (GCV, AIC etc.) to occasionally undersmooth substantially, particularly at moderate sample sizes, with REML being somewhat less problematic in this regard. Where appropriate, simpler models such as GLMs may be preferable to GAMs unless GAMs improve predictive ability substantially (in validation sets) for the application in question. In addition, univariate outlier detection approaches can be implemented where effective. These approaches can look for values that surpass the normal range of distribution for a given machine component and could include calculation of Z-scores or Robust Z-scores (using the median absolute deviation).
In some embodiments, the programming language ‘R’ is used as an environment for statistical computing and graphics and for creating appropriate models. Error statistics and/or the z-scores of the predicted errors are used to further minimize prediction errors.
The engine's operating ranges can be divided into multiple distinct ranges and multiple multi-dimensional models can be built to improve model accuracy.
Next, depending on the capabilities of the edge device (e.g., whether or not it can execute the programming language ‘R’), engine models are deployed as R models or the equivalent database lookup tables are generated and deployed, that describe the models for the bounded region of the independent variables.
The same set of training data that was used to build the model is then passed as an input set to the model, in order to create a predicted sensor value(s) time series. By subtracting the predicted sensor values from the measured sensor values, an error time series for all the dependent sensor values is created for the training data set. The error statistics, namely mean and standard deviations of the training period error series, are computed and saved as the training period error statistics.
In some embodiments, in order for the z-statistics to work, the edge device typically needs to select more than 30 samples for every data point and provide average value for every minute. Some embodiments implement the system with approximately 60 samples per minute (1 sec interval) and edge device calculates every minute's average values by averaging (arithmetic mean) the values for every minute.
Once the model is deployed to the edge device, and the system is operational, the dependent and independent sensor values can be measured in near real-time and the minute's average data may be computed. The expected value for dependent engine sensors can be predicted by passing the independent sensor values to the engine model. The error (i.e., the difference) between the measured value of a dependent variable and its predicted value, can then be computed. These errors are standardized by subtracting the training error mean from the instantaneous error and dividing this difference by the training error standard deviations for a given sensor. This process creates z-scores of error or standard error time-series that can be used to detect anomalies and, with an alert processing system, detect and send notifications to on-board and shore based systems at near real-time when the standard error is above/below a certain number of error standard deviations or is above/below a certain z-score.
According to some embodiments, an anomaly classification system may also be deployed that ties anomalies to particular kinds of engine failures. The z-scores of an error data series from multiple engine sensors are classified (as failures or not failures) in near real-time and to a high degree of certainty through previous training on problem cases, learned engine issues, and/or engine sensor issues.
This classification may be by neural network or deep neural network, clustering algorithm, principal component analysis, various statistical algorithms, or the like. Some examples are described in the incorporated references, supra.
Some embodiments of the classification system provide a mechanism (e.g., a design and deployment tool(s)) to select unique, short time periods for an asset and tag (or label) the selected periods with arbitrary strings that denote classification types. A user interface may be used to view historical engine data and/or error time series data, and to select and tag time periods of interest. Then, the system calculates robust Mahalanobis distances (and/or Bhattacharyya distances) from the z-scores of error data from multiple engine sensors of interests and stores the calculated range for the tagged time periods in the edge device and/or cloud database for further analysis.
The Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. The coefficient can be used to determine the relative closeness of the two samples being considered. It is used to measure the separability of classes in classification and it is considered to be more reliable than the Mahalanobis distance, as the Mahalanobis distance is a particular case of the Bhattacharyya distance when the standard deviations of the two classes are the same. Consequently, when two classes have similar means but different standard deviations, the Mahalanobis distance would tend to zero, whereas the Bhattacharyya distance grows depending on the difference between the standard deviations.
The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let (Ω, B, v) be a measure space, and let P be the set of all probability measures (cf. Probability measure) on B that are absolutely continuous with respect to v. Consider two such probability measures P1, P2, ∈P and let p1 and p2 be their respective density functions with respect to ν. The Bhattacharyya coefficient between P1 and P2, denoted by ρ(P1, P2), is defined by
where dPi/dν is the Radon-Nikodým derivative (cf. Radon-Nikodým theorem) of Pi (i=1, 2) with respect to ν. It is also known as the Kakutani coefficient and the Matusita coefficient. Note that ρ(P1, P2) does not depend on the measure ν dominating P1 and P2.
i) 0≤ρ(P1, P2)≤1;
ii) ρ(P1, P2)=1 if and only if P1=P2;
iii) ρ(P1, P2)=0 if and only if P1 is orthogonal to P2.
The Bhattacharyya distance between two probability distributions P1 and P2, denoted by B(1,2), is defined by B(1,2)=−ln ρ(P1, P2).
0≤B(1,2)≤∞. The distance B(1,2) does not satisfy the triangle inequality. The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking t=1/2):
The Hellinger distance between two probability measures P1 and P2, denoted by H(1,2), is related to the Bhattacharyya coefficient by the following relation: H(1,2)=2[1−ρ(P1,P2)].
B(1,2) is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then the total probability of misclassification is majorized by e−B(1,2). In the case of equal covariances, maximization of B(1,2) yields the Fisher linear discriminant function.
The Mahalanobis distance is a measure of the distance between a point P and a distribution D. It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D. This distance is zero if P is at the mean of D, and grows as P moves away from the mean along each principal component axis, the Mahalanobis distance measures the number of standard deviations from P to the mean of D. If each of these axes is re-scaled to have unit variance, then the Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. The Mahalanobis distance is thus unitless and scale-invariant and takes into account the correlations of the data set.
The Mahalanobis distance is quantity ρ(X,Y|A)={(X−Y)TA(X−Y)}1/2, where X, Y are vectors and A is a matrix (and □T denotes transposition). It is used in multi-dimensional statistical analysis; in particular, for testing hypotheses and the classification of observations. The quantity ρ(μ1, μ2|Σ−1) is a distance between two normal distributions with expectations μ1 and μ2 and common covariance matrix Σ. The Mahalanobis distance between two samples (from distributions with identical covariance matrices), or between a sample and a distribution, is defined by replacing the corresponding theoretical moments by sampling moments. As an estimate of the Mahalanobis distance between two distributions one uses the Mahalanobis distance between the samples extracted from these distributions or, in the case where a linear discriminant function is utilized—the statistic Φ−1(α)+Φ−1(β), where α and β are the frequencies of correct classification in the first and the second collection, respectively, and Φ is the normal distribution function with expectation 0 and variance 1.
At run time, the system calculates the z-scores of error data from the engine sensor data time series then optionally calculates the robust Mahalanobis distance (and/or Bhattacharyya distances) of the z-scores of error data of the selected dimension(s) (i.e., engine sensor(s)). The value is compared against the range of Mahalanobis distances (and/or Bhattacharyya distances) for analyzing and comparing a set of tensors of z-scores of errors during a test period against a set of tensors of z-scores of errors during training period that had a positive match and tagging, that were stored previously as a part of the deployed classification labels (specific type of failure or not specific type of failure) and classified accordingly. When a failure classification is obtained, the alerts system sends notifications to human operators and/or automated systems.
Some embodiments can then provide a set of data as an input to a user interface (e.g., analysis gauges) in the form of standardized error values for each sensor and/or the combined Mahalanobis distance (or Bhattacharyya distance) for each sensor. This allows users to understand why data were classified as failures or anomalies.
Of note, the system does not necessarily differentiate between operational engine issues and engine sensor issues. Rather, it depends on the classifications made during the deep learning training period in accordance with some embodiments. Also, because the system uses standardized z-errors for creating the knowledge base of issues (i.e., tags and Mahalanobis/Bhattacharyya distance ranges and standardized error ranges), the model can be deployed as a prototype for other engines and/or machines of similar types before an engine-specific model is created.
It is therefore an object to provide a method of determining anomalous operation of a system, comprising: capturing a stream of data representing sensed or determined operating parameters of the system, wherein the operating parameters vary in dependence on an operating state of the system, over a range of operating states of the system, with a stability indicator representing whether the system was operating in a stable state when the operating parameters were sensed or determined; characterizing statistical properties of the stream of data, comprising at least an amplitude-dependent parameter and a variance of the amplitude over time parameter for an operating regime representing stable operation; determining a statistical norm for the characterized statistical properties that reliably distinguish between normal operation of the system and anomalous operation of the system; and outputting a signal dependent on whether a concurrent stream of data representing sensed or determined operating parameters of the system represent anomalous operation of the system.
It is also an object to provide a method of determining anomalous operation of a system, comprising: capturing a plurality of streams of training data representing sensor readings over a range of states of the system during a training phase; characterizing joint statistical properties of the plurality of streams of data representing sensor readings over the range of states of the system during the training phase, comprising determining a plurality of quantitative standardized errors between a predicted value of a respective training datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time; determining a statistical norm for the characterized joint statistical properties that reliably distinguishes between a normal state of the system and an anomalous state of the system; and storing the determined statistical norm in a non-volatile memory.
It is also an object to provide a method of predicting anomalous operation of a system, comprising: characterizing statistical properties of a plurality of streams of data representing sensor readings over a range of states of the system during a training phase, comprising determining a statistical variance over time of a quantitative standardized errors between a predicted value of a respective training datum and a measured value of the respective training datum; determining a statistical norm for the characterized statistical properties comprising at least one decision boundary that reliably distinguishes between a normal operational state of the system and an anomalous operational state of the system; and storing the determined statistical norm in a non-volatile memory.
It is a further object to provide a system for determining anomalous operational state, comprising: an input port configured to receive a plurality of streams of training data representing sensor readings over a range of states of the system during a training phase; at least one automated processor, configured to: characterize joint statistical properties of plurality of streams of data representing sensor readings over the range of states of the system during the training phase, based on a plurality of quantitative standardized errors between a predicted value of a respective training datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time; and determine a statistical norm for the characterized joint statistical properties that reliably distinguishes between a normal state of the system and an anomalous state of the system; and a non-volatile memory configured to store the determined statistical norm.
Another object provides a method of determining anomalous operation of a system, comprising: capturing a plurality of streams of training data representing sensor readings over a range of states of the system during a training phase; transmitting the captured streams of training data to a remote server; receiving, from the remote server, a statistical norm for characterized joint statistical properties that reliably distinguishes between a normal state of the system and an anomalous state of the system, the characterized joint statistical properties being based on a plurality of streams of data representing sensor readings over the range of states of the system during the training phase, comprising quantitative standardized errors between a predicted value of a respective training datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time; capturing a stream of data representing sensor readings over states of the system during an operational phase; and producing a signal selectively dependent on whether the stream of data representing sensor readings over states of the system during the operational phase are within the statistical norm.
A further object provides a method of determining a statistical norm for non-anomalous operation of a system, comprising: receiving a plurality of captured streams of training data at a remote server, the captured plurality of streams of training data representing sensor readings over a range of states of a system during a training phase; processing the received a plurality of captured streams of training data to determine a statistical norm for characterized joint statistical properties that reliably distinguishes between a normal state of the system and an anomalous state of the system, the characterized joint statistical properties being based on a plurality of streams of data representing sensor readings over the range of states of the system during the training phase, comprising quantitative standardized errors between a predicted value of a respective training datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time; and transmitting the determined statistical norm to the system. The method may further comprise, at the system, capturing a stream of data representing sensor readings over states of the system during an operational phase, and producing a signal selectively dependent on whether the stream of data representing sensor readings over states of the system during the operational phase are within the statistical norm.
A non-transitory computer-readable medium is also encompassed, storing therein instructions for controlling a programmable processor to perform any or all steps of a computer-implemented method disclosed herein.
At least one stream of training data may be aggregated prior to characterizing the joint statistical properties of the plurality of streams of data representing the sensor readings over the range of states of the system during the training phase.
The method may further comprise communicating the captured plurality of streams of training data representing sensor readings over a range of states of the system during a training phase from an edge device to a cloud device prior to the cloud device characterizing the joint statistical property of the plurality of streams of operational data; communicating the determined statistical norm from the cloud device to the edge device; and wherein the non-volatile memory may be provided within the edge device.
The method may further comprise capturing a plurality of streams of operational data representing sensor readings during an operational phase; determining a plurality of quantitative standardized errors between a predicted value of a respective operational datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time in the edge device; and comparing the plurality of quantitative standardized errors and the variance of the respective plurality of quantitative standardized errors with the determined statistical norm, to determine whether the plurality of streams of operational data representing the sensor readings during the operational phase represent an anomalous state of system operation.
The method may further comprise capturing a plurality of streams of operational data representing sensor readings during an operational phase; characterizing a joint statistical property of the plurality of streams of operational data, comprising determining a plurality of quantitative standardized errors between a predicted value of a respective operational datum, and a measured value of the respective training datum, and a variance of the respective plurality of quantitative standardized errors over time; and comparing the characterized joint statistical property of the plurality of streams of operational data with the determined statistical norm to determine whether the plurality of streams of operational data representing the sensor readings during the operational phase represent an anomalous state of system operation.
The method may further comprise capturing a plurality of streams of operational data representing sensor readings during an operational phase; and determining at least one of a Mahalanobis distance, a Bhattacharyya distance, Chernoff distance, a Matusita distance, a KL divergence, a Symmetric KL divergence, a Patrick-Fisher distance, a Lissack-Fu distance and a Kolmogorov distance of the captured plurality of streams of operational data with respect to the determined statistical norm. The method may further comprise determining a Mahalanobis distance between the plurality of streams of training data representing sensor readings over the range of states of the system during the training phase and a captured plurality of streams of operational data representing sensor readings during an operational phase of the system. The method may further comprise determining a Bhattacharyya distance between the plurality of streams of training data representing sensor readings over the range of states of the system during the training phase and a captured plurality of streams of operational data representing sensor readings during an operational phase of the system.
The method may further comprise determining an anomalous state of operation based on a statistical difference between sensor data obtained during operation of the system subsequent to the training phase and the statistical norm. The method may further comprise performing an analysis on the sensor data obtained during the anomalous state, defining a signature of the sensor data obtained leading to the anomalous state, and communicating the defined signature of the sensor data obtained leading to the anomalous state to a second system. The method may still further comprise receiving a defined signature of sensor data obtained leading to an anomalous state of a second system from the second system and performing a signature analysis of a stream of sensor data after the training phase. The method may further comprise receiving a defined signature of sensor data obtained leading to an anomalous state of a second system from the second system, and integrating the defined signature with the determined statistical norm, such that the statistical norm may be updated to distinguish a pattern of sensor data preceding the anomalous state from a normal state of operation.
The method may further comprise determining a z-score for the plurality of quantitative standardized errors. The method may further comprise determining a z-score for a stream of sensor data received after the training phase. The method may further comprise decimating a stream of sensor data received after the training phase. The method may further comprise decimating and determining a z-score for a stream of sensor data received after the training phase.
The method may further comprise receiving a stream of sensor data received after the training phase; determining an anomalous state of operation of the system based on differences between the received stream of sensor data received after the training phase; and tagging a log of sensor data received after the training phase with an annotation of anomalous state of operation. The method may further comprise classifying the anomalous state of operation as a particular kind of event.
The plurality of streams of training data representing the sensor readings over the range of states of the system may comprise data from a plurality of different types of sensors. The plurality of streams of training data representing the sensor readings over the range of states of the system may comprise data from a plurality of different sensors of the same type. The method may further comprise classifying a stream of sensor data received after the training phase by at least performing a k-nearest neighbors analysis. The method may further comprise determining whether a stream of sensor data received after the training phase may be in a stable operating state and tagging a log of the stream of sensor data with a characterization of the stability.
The method may include at least one of: transmit the plurality of streams of training data to a remote server; transmit the characterized joint statistical properties to the remote server; transmit the statistical norm to the remote server; transmit a signal representing a determination whether the system is operating anomalously to the remote server based on the statistical norm; receive the characterized joint statistical properties from the remote server; receive the statistical norm from the remote server; receive a signal representing a determination whether the system is operating anomalously from the remote server based on the statistical norm; and receive a signal from the remote server representing a predicted statistical norm for operation of the system, representing a type of operation of the system outside the range of states during the training phase, based on respective statistical norms for other systems.
According to one embodiment, upon initiation of the system, there is no initial model, and the edge device sends lossless uncompressed data to the cloud computer for analysis. Once a model is built and synchronized or communicated by both sides of a communication pair, the communications between them may synchronously switch to a lossy compressed mode of data communication. In cases where different operating regimes have models of different maturity, the edge device may determine on a class-by-class basis what mode of communication to employ. Further, in some cases, the compression of the data may be tested according to different algorithms, and the optimal algorithm employed, according to criteria which may include communication cost or efficiency, various risks and errors or cost-weighted risks and errors in anomaly detection, or the like. In some cases, computational complexity and storage requirements of compression is also an issue, especially in lightweight IoT sensors with limited memory and processing power.
In one embodiment, the system can initially use a “stock” model and corresponding “stock statistical parameters” (standard deviation of error and mean error) in the beginning, when there is no custom or system-specific model built for that specific asset, and then later when the edge device builds a new and sufficiently complete model, it will send that model to the cloud computer, and then both side can synchronously switch to the new model. In this scheme only the edge device would build the models, as cloud always receives lossy data. As discussed above, the stock model may initiate with population statistics for the class of system, and as individual-specific data is acquired, update the model to reflect the specific device rather than the population of devices. The transition between models need not be binary, and some blending of population parameters and device specific parameters may be present or persistent in the system. This is especially useful where the training data is sparse or unavailable for certain regimes of operation, or where certain types of anomalies cannot or should not be emulated during training. Thus, certain catastrophic anomalies may be preceded by signature patterns, which may be included in the stock model. Typically, the system will not, during training, explore operating regions corresponding to imminent failure, and therefore the operating regimes associated with those states will remain unexplored. Thus, the aspects of the stock model relating to these regimes of operation may naturally persist, even after the custom model is mature.
In some embodiments, to ensure continuous effective monitoring of anomalies, the system can automatically monitor itself for the presence of drift. Drift can be detected for a sensor when models no longer fit the most recent data well and the frequency of type I errors the system detects exceeds an acceptable, pre-specified threshold. Type I errors can be determined by identifying when a model predicts an anomaly and no true anomaly is detected in a defined time window around the predicted anomaly.
True anomalies can be detected when a user provides input in near real-time that a predicted anomaly is a false alert or when a threshold set on a sensor is exceeded. Thresholds can either be set by following manufacturer's specifications for normal operating ranges or by setting statistical thresholds determined by analyzing the distribution of data during normal sensor operation and identifying high and low thresholds.
In these embodiments, when drift is detected, the system can trigger generation of new models (e.g., of same or different model types) on the most recent data for the sensor. The system can compare the performance of different models or model types on identical test data sampled from the most recent sensor data and put a selected model (e.g., a most effective model) into deployment or production. The most effective model can be the model that has the highest recall (lowest rate of type II errors), lowest false positive rate (lowest rate of type I errors), and/or maximum lead time of prediction (largest amount of time that it predicts anomalies before manufacturer-recommended thresholds detect them). However, if there is no model whose false positive rate falls below a specified level, the system will not put a model into production. In that case, once more recent data is captured, the system will undertake subsequent attempts at model generation until successful.
In some embodiments, the anomaly detection system described herein may be used to determine engine coolant temperature anomalies on a marine vessel such as a tugboat.
An engine's data 1008 are accessed from a database 1010 to be used as input data for model generation.
If enough rows of engine data 1008 are available 1012, the model building process begins by filtering the engine data time series 1008. An iterator 1050 slices a data row from the set of n rows 1020. If the predictor variables are within the acceptable range 1022 and the engine data are stable 1024 as defined by the model metadata table 1006, the data row is included in the set of data rows to be used in the model 1026. If the predictor variables' data is not within range or engine data are not stable, the data row is excluded 1028 from the set of data rows to be used in the model 1026. The data filtering process then continues for each data row in the engine data time series 1008.
If enough data rows are available after filtering 1030, the engine model(s) is generated using machine learning 1032. Algorithm 1 additionally details the data filtering and model(s) generation process in which the stability of predictor variables is determined and used as a filter for model input data. The machine learning model 1032 may be created using a number of appropriate modeling techniques or machine learning algorithms (e.g., splines, support vector machines, neural networks, and/or generalized additive model). In some implementations, the model with the lowest model bias and lowest mean squared error (MSE) is selected as the model for use in subsequent steps.
If too few data rows are available after filtering 1030, a specific error message may be displayed 1016 and the model generation routine ended 1018
If enough data rows are available 1030 and the machine-learning based model has been generated 1032, the model may optionally be converted into a lookup table, using Algorithm 2, as a means of serializing the model for faster processing. The lookup table can contain n+m columns considering the model represents ƒ: nm. For engine RPM between 0 and 2000 RPM and load between 0 and 100%, the lookup table can have 200,000+1 rows assuming an interval of 1 for each independent variable. The model can have 2+6=8 columns assuming independent variables of engine RPM and load and dependent variables of coolant temperature, coolant pressure, oil temperature, oil pressure, fuel pressure, fuel actuator percentage. For each engine RPM and load, the model is used to predict the values of the dependent parameters with the results stored in the lookup table.
With the model 1032 known, the training period error statistics can be calculated as described in Algorithm 3. Using the generated model 1032, a prediction for all dependent sensor values can be made based on that generated model 1032 and data for the independent variables during the training period.
Algorithm 4 describes how the error statistics can be standardized into an error z-score series. The error z-score series is calculated by subtracting the error series mean from each error in the error time series and dividing the result by the error standard deviation, using error statistics from Algorithm 3.
With the error z-score series calculated and the model deployed to the edge device and/or cloud database, the design time steps of Algorithm 5 are complete. At runtime, engine data are stored in a database either at the edge or in the cloud. Using Algorithm 4 with the training error statistics of Algorithm 3, the test data error z-scores can be calculated. If the absolute value of the test data error z-scores are above a given threshold (e.g., user defined or automatically generated), an anomaly condition is identified. An error notification may be sent or other operation taken based on this error condition.
Algorithm 6, which details the calculation of the Mahalanobis distance and/or robust Mahalanobis distance, can be used along with Algorithm 7 to classify anomalies and attempt to identify the anomalies that may lead to a failure. To create the Mahalanobis and/or robust Mahalanobis distance, the training period error z-score series (e.g. the series of
As used herein, the term “processor” may refer to any device or portion of a device that processes electronic data from registers and/or memory to transform that electronic data into other electronic data that may be stored in registers and/or memory.
A system which implements the various embodiments of the presently disclosed technology may be constructed as follows. The system includes at least one controller that may include any or any combination of a system-on-chip, or commercially available embedded processor, Arduino, MeOS, MicroPython, Raspberry Pi, or other type processor board. The system may also include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a programmable combinatorial circuit (e.g., FPGA), a processor (shared, dedicated, or group) or memory (shared, dedicated, or group) that may execute one or more software or firmware programs, or other suitable components that provide the described functionality. The controller has an interface to a communication port, e.g., a radio or network device, a user interface, and other peripherals and other system components.
In some embodiments, one or more of sensors determine, sense, and/or provide to controller data regarding one or more other characteristics may be and/or include Internet of Things (“IoT”) devices. IoT devices may be objects or “things”, each of which may be embedded with hardware or software that may enable connectivity to a network, typically to provide information to a system, such as controller. Because the IoT devices are enabled to communicate over a network, the IoT devices may exchange event-based data with service providers or systems in order to enhance or complement the services that may be provided. These IoT devices are typically able to transmit data autonomously or with little to no user intervention. In some embodiments, a connection may accommodate vehicle sensors as IoT devices and may include IoT-compatible connectivity, which may include any or all of WiFi, LoRan, 900 MHz Wifi, BlueTooth, low-energy BlueTooth, USB, UWB, etc. Wired connections, such as Ethernet 100BaseT, 1000baseT, CANBus, USB 2.0, USB 3.0, USB 3.1, etc., may be employed.
Embodiments may be implemented into a system using any suitable hardware and/or software to configure as desired. The computing device may house a board such as motherboard which may include a number of components, including but not limited to a processor and at least one communication interface device. The processor may include one or more processor cores physically and electrically coupled to the motherboard. The at least one communication interface device may also be physically and electrically coupled to the motherboard. In further implementations, the communication interface device may be part of the processor. In embodiments, processor may include a hardware accelerator (e.g., FPGA).
Depending on its applications, computing device used in the system may include other components which include, but are not limited to, volatile memory (e.g., DRAM), non-volatile memory (e.g., ROM), and flash memory. In embodiments, flash and/or ROM may include executable programming instructions configured to implement the algorithms, operating system, applications, user interface, and/or other aspects in accordance with various embodiments of the presently disclosed technology.
In embodiments, computing device used in the system may further include an analog-to-digital converter, a digital-to-analog converter, a programmable gain amplifier, a sample-and-hold amplifier, a data acquisition subsystem, a pulse width modulator input, a pulse width modulator output, a graphics processor, a digital signal processor, a crypto processor, a chipset, a cellular radio, an antenna, a display, a touchscreen display, a touchscreen controller, a battery, an audio codec, a video codec, a power amplifier, a global positioning system (GPS) device or subsystem, a compass (magnetometer), an accelerometer, a barometer (manometer), a gyroscope, a speaker, a camera, a mass storage device (such as a SIM card interface, and SD memory or micro-SD memory interface, SATA interface, hard disk drive, compact disk (CD), digital versatile disk (DVD), and so forth), a microphone, a filter, an oscillator, a pressure sensor, and/or an RFID chip.
The communication network interface device used in the system may enable wireless communications for the transfer of data to and from the computing device. The term “wireless” and its derivatives may be used to describe circuits, devices, systems, processes, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a non-solid medium. The term does not imply that the associated devices do not contain any wires, although in some embodiments they might not. The communication chip 406 may implement any of a number of wireless standards or protocols, including but not limited to Institute for Electrical and Electronic Engineers (IEEE) standards including Wi-Fi (IEEE 802.11 family), IEEE 802.16 standards (e.g., IEEE 802.16-2005 Amendment), Long-Term Evolution (LTE) project along with any amendments, updates, and/or revisions (e.g., advanced LTE project, ultra-mobile broadband (UMB) project (also referred to as “3GPP2”), etc.). IEEE 802.16 compatible BWA networks are generally referred to as WiMAX networks, an acronym that stands for Worldwide Interoperability for Microwave Access, which is a certification mark for products that pass conformity and interoperability tests for the IEEE 802.16 standards. The communication chip 406 may operate in accordance with a Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Evolved HSPA (E-HSPA), or LTE network. The communication chip 406 may operate in accordance with Enhanced Data for GSM Evolution (EDGE), GSM EDGE Radio Access Network (GERAN), Universal Terrestrial Radio Access Network (UTRAN), or Evolved UTRAN (E-UTRAN). The communication chip 406 may operate in accordance with Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Digital Enhanced Cordless Telecommunications (DECT), Evolution-Data Optimized (EV-DO), derivatives thereof, as well as any other wireless protocols that are designated as 3G, 4G, 5G, and beyond. The communication chip may operate in accordance with other wireless protocols in other embodiments. The computing device may include a plurality of communication chips. For instance, a first communication chip may be dedicated to shorter range wireless communications such as Wi-Fi and Bluetooth and a second communication chip may be dedicated to longer range wireless communications such as GPS, EDGE, GPRS, CDMA, WiMAX, LTE, Ev-DO, and others.
Exemplary hardware for performing the technology includes at least one automated processor (or microprocessor) coupled to a memory. The memory may include random access memory (RAM) devices, cache memories, non-volatile or back-up memories such as programmable or flash memories, read-only memories (ROM), etc. In addition, the memory may be considered to include memory storage physically located elsewhere in the hardware, e.g. any cache memory in the processor as well as any storage capacity used as a virtual memory, e.g., as stored on a mass storage device.
The hardware may receive a number of inputs and outputs for communicating information externally. For interface with a user or operator, the hardware may include one or more user input devices (e.g., a keyboard, a mouse, imaging device, scanner, microphone) and a one or more output devices (e.g., a Liquid Crystal Display (LCD) panel, a sound playback device (speaker)). To embody the present invention, the hardware may include at least one screen device.
For additional storage, as well as data input and output, and user and machine interfaces, the hardware may also include one or more mass storage devices, e.g., a floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g. a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive) and/or a tape drive, among others. Furthermore, the hardware may include an interface with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the hardware typically includes suitable analog and/or digital interfaces between the processor and each of the components is known in the art.
The hardware operates under the control of an operating system, and executes various computer software applications, components, programs, objects, modules, etc. to implement the techniques described above. Moreover, various applications, components, programs, objects, etc., collectively indicated by application software, may also execute on one or more processors in another computer coupled to the hardware via a network, e.g. in a distributed computing environment, whereby the processing required to implement the functions of a computer program may be allocated to multiple computers over a network.
In general, the routines executed to implement the embodiments of the present disclosure may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as a “computer program.” A computer program typically comprises one or more instruction sets at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects of the invention. Moreover, while the technology has been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments of the invention are capable of being distributed as a program product in a variety of forms, and may be applied equally to actually effect the distribution regardless of the particular type of computer-readable media used. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD-ROMs), Digital Versatile Disks (DVDs)), flash memory, etc., among others. Another type of distribution may be implemented as Internet downloads. The technology may be provided as ROM, persistently stored firmware, or hard-coded instructions.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is understood that such embodiments are merely illustrative and not restrictive of the broad invention and that the present disclosure is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. The disclosed embodiments may be readily modified or re-arranged in one or more of its details without departing from the principals of the present disclosure.
Implementations of the subject matter and the operations described herein can be implemented in digital electronic circuitry, computer software, firmware or hardware, including the structures disclosed in this specification and their structural equivalents or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on one or more computer storage medium for execution by, or to control the operation of data processing apparatus. Alternatively, or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a non-transitory computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate components or media (e.g., multiple CDs, disks, or other storage devices).
Accordingly, the computer storage medium may be tangible and non-transitory. All embodiments within the scope of the claims should be interpreted as being tangible and non-abstract in nature, and therefore this application expressly disclaims any interpretation that might encompass abstract subject matter.
The present technology provides analysis that improves the functioning of the machine in which it is installed and provides distinct results from machines that employ different algorithms.
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The term “client or “server” includes a variety of apparatuses, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, a code that creates an execution environment for the computer program in question, e.g., a code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The architecture may be CISC, RISC, SISD, SIMD, MIMD, loosely-coupled parallel processing, etc. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone (e.g., a smartphone), a personal digital assistant (PDA), a mobile audio or video player, a game console, or a portable storage device (e.g., a universal serial bus (USB) flash drive). Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a LCD (liquid crystal display), OLED (organic light emitting diode), TFT (thin-film transistor), plasma, other flexible configuration, or any other monitor for displaying information to the user and a keyboard, a pointing device, e.g., a mouse, trackball, etc., or a touch screen, touch pad, etc., by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user. For example, by sending webpages to a web browser on a user's client device in response to requests received from the web browser.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are considered in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown, in sequential order or that all operations be performed to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking or parallel processing may be utilized.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments. In cases where any document incorporated by reference conflicts with the present application, the present application controls.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
n
m;
This application claims the benefit of provisional U.S. Application No. 62/813,659, filed Mar. 4, 2019 and entitled “SYSTEM AND METHOD FOR NEAR REAL-TIME DETECTION AND CLASSIFICATION OF MACHINE ANOMALIES USING MACHINE LEARNING,” which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62813659 | Mar 2019 | US |