The present invention relates to the technical field of computer science, in particular to a working condition state modeling and model correcting method.
The maintenance function has become more and more important in the past few decades. Unexpected downtime may greatly influence the maintenance function, and will cause operational disruption and productivity loss, or even production accidents, it is difficult to realize timely maintenance under limited maintenance resources and personnel. The efficiency of abnormality diagnosis methods often depends on the quality of diagnosis models. The methods of establishing mathematical models can be roughly classified into two categories: mechanism analysis modeling methods and statistical modeling methods.
The mechanism analysis modeling method is to establish a mathematical equation between key variables and other measurable variables according to the physical and chemical laws in the production process from a process mechanism, and to establish a mathematical model of a system of equations describing the process through derivation. This modeling has the advantage that the internal structure and relationship of the system can be clearly shown, and the nature of the actual process is reflected, However, this method is difficult to model, long in cycle and difficult to obtain various structural parameters and physical parameters in the model, and is limited in the application.
The statistical modeling method is to directly model a system as a black box only according to the relationship between input and output data in a research object instead of analyzing its internal mechanism. The model has strong online correction capability, and can be applicable to highly nonlinear and seriously uncertain systems, so as to provide an effective way for solving the model problem of complex system process parameters. However, the statistical modeling method has certain limitations. For complex nonlinear processes, sample data generally only comprises some areas, and cannot cover the entire area. The increase of the range of a sample data set may cause a complex model and increased difficulty in solving.
Aiming at the defects of the prior art, the present invention provides a working condition state modeling and model correcting method, which introduces expert prior knowledge based on the statistical modeling method to solve the problem that the existing statistical modeling method cannot cover the whole area.
To realize the above-mentioned purpose, the present invention adopts the technical solution:
A working condition state modeling and model correcting method comprises the following steps:
step 1: collecting data, and arranging the data in a chronological order to form a time sequence data set;
step 2: preprocessing the time sequence data set;
step 3: clustering the preprocessed time sequence data set, computing a central point data set of the cluster, and generating a working condition data set and a working condition process data set;
step 4: counting a working condition transition probability for the working condition process data set to form a working condition transition probability model data set;
step 5: collecting the data, and detecting and processing the data;
step 6: computing a working condition state transition mode phase by phase and processing.
The step 1 comprises:
marking time sequence labels for the collected data (x1, x2, . . . , xm) to form a time sequence data set (ti, xi1, xi2, . . . , xim),wherein m represents the number of parameters; ti represents the time sequence labels which are gradually increased; and x represents different parameters.
The step 2 comprises:
deleting irrelevant parameters in the time sequence data in the time sequence data set ti, xi1, xi2, . . . , xim) to obtain a time sequence data set (ti, xi1, xi2, . . . , xin) after dimension reduction, n≤m, wherein ti represents the time sequence labels which are gradually increased; m represents the number of parameters; n represents the number of parameters after dimension reduction; and x represents different parameters.
The dimension reduction comprises:
respectively computing a variance for each dimension of the parameters to obtain (σ1, σ2, . . . , σm); computing the mean value
of the variance, and deleting the values in (σ1, σ2, . . . , σm) that are less than
A k-means algorithm is used for clustering, specifically:
the input serving as a data set (xi1, xi2, . . . , xin) after dimension reduction, the range of k values being [Kmin, Kmax];
conducting k-means clustering on the data set (xi1, xi2, . . . , xin) after dimension reduction for each k value, and solving the sum of squared errors (SSE) value in clusters for each clustering result;
using cluster partitions (C1, C2, . . . , CK) as output when min(SSE) is taken,
wherein C1, C2, . . . , CK represent a set of clusters, and K represents the number of partitioned clusters, i.e., the number of working condition types.
The generating the working condition data set and the working condition process data set comprises:
firstly, marking the cluster partitions (C1, C2, . . . , CK) of the data set (xi1, xi2, . . . , xin) with the working condition types to form a working condition data set expressed as (xi1, xi2, . . . , xin, yk); and simultaneously, respectively computing the central points of the cluster partitions to form a central point data set (ck1, ck2, . . . , ckn, yk),wherein y represents the working condition types and the number of y is the same as the number of the cluster partitions, i.e., k≤K; C represents parameters corresponding to the working condition data set (xi1, xi2, . . . , xin, yk);
then, computing a distance from each data in a cluster to a central node in the cluster, and taking a maximum distance value Dmax;
finally, adding the time sequence labels for the working condition data set by taking the time sequence data set as a reference, to form a working condition process data set expressed as ti, xi1, xi2, . . . , xin, yk) ,wherein y represents the working condition types and the number of y is the same as the number of the cluster partitions, i.e., k≤K; ti represents the time sequence labels which are gradually increased.
The working condition transition probability model data set is P(ya
K is the number of the working condition types; 1≤a1, a2, a3, aM, aM+1≤n; and n represents the number of the parameters after dimension reduction,
In the working condition transition mode ya
The collecting the data, and detecting and processing the data comprises:
collecting the data and taking n-dimensional parameters as input data (x′1, x′2, . . . , x′n), wherein n represents the number of the parameters after dimension reduction, and the parameters are the same as the parameters selected in the data set (xi1, xi2, . . . , xin) after dimension reduction; computing a distance from the input data to the central point data set, and taking a minimum value d of the distance;
if d≤Dmax, taking the working condition type of the central point with a distance of d; adding the time sequence labels to form time sequence data (t′, x′1, x′2, . . . , x′n, y′); and saving the data into a data set (t′1, x′i1, x′i2, . . . , x′in y′k′) to be processed;
d>Dmax indicating that the input data is not matched with any working condition type; and modifying the working condition data set and the central point data set, wherein Dmaxrepresents the maximum value of the distance from each data in the cluster to the central node in the cluster.
The step 6 comprises:
continuously taking the working condition transition mode (yi, yi+1, . . . , yM, yM+1) with a sliding window size of M for the data set (t′i, x′i1, x′i1, . . . , x′in, y′k′) to be processed according to the chronological order; inquiring and counting the probability p in the working condition transition probability model; if p>ϵ, continuing to compute the working condition of the time sequence of a next group of data parameters; if 0≤p≤ϵ, correcting a corresponding probability in the working condition transition probability model, wherein a represents ϵ probability value defined according to expert knowledge.
The corresponding probability in the working condition transition probability model comprises:
when p=0, adding a probability value of the working condition transition mode to be corrected to the working condition transition probability model, recorded as ϵ; accordingly, reducing the probability values of other working condition transition modes in the data set of the working condition transition probability model on average;
when 0<p≤ϵ, modifying the probability value of the working condition transition mode to be corrected to the working condition transition probability model, recorded as p+ϵ; accordingly, reducing the probability values of other working condition transition modes in the data set of the working condition transition probability model on average,
wherein ∈ represents a probability value defined according to expert knowledge, and ∈=ϵ.
The present invention has the following beneficial effects and advantages:
1. The present invention is based on a counting modeling method, introduces expert prior knowledge to correct the established model gradually, enables the model range to cover the overall system working condition state and solves the problem of low coverage rage in the mechanism analysis modeling methods and the counting modeling method.
2. The present invention can be used as the input of an abnormal working condition diagnosis method, and can effectively improve the accuracy rate of abnormality diagnosis.
The present invention will be further described in detail below in combination with the drawings and the embodiments.
To make the above-mentioned purpose, features and advantages of the present invention more clear and understandable, specific embodiments of the present invention will be described below in detail in combination with the drawings. In the following description, many specific details are elaborated to thoroughly understand the present invention. However, the present invention can be implemented in other modes different from those described herein. Those skilled in the art can make similar improvement without departing from the connotation of the present invention. Therefore, the present invention is not limited by specific embodiments disclosed below.
Unless otherwise defined, all technical and scientific terms used herein have the same meanings as those generally understood by those skilled in the art in the present invention. The terms used in the description of the present invention are intended to merely describe concrete embodiments, not to limit the present invention.
Step 1: collecting data, and forming time sequence data; collecting the gathered data and representing the data as (x1, x2, . . . , xm), wherein m represents the number of parameters; marking the time sequence labels to form a time sequence data set represented as (ti, xi1, xi2, . . . , xim), wherein ti represents the time sequence labels which are gradually increased; and m represents the number of parameters; the collected data is the data taken from a real-time database in a site production process.
Step 2: preprocessing the time sequence data parameters. The preprocessing course is to delete irrelevant parameter from the time sequence data set (ti, xi1, xi2, . . . , xim) to obtain a time sequence data set after dimension reduction, represented as (ti, xi1, xi2, . . . , xin), m≤m, wherein n represents the number of the parameters after dimension reduction and x represents different parameters. The specific dimension reduction process is as follows:
respectively computing a variance for each dimension of the parameters to obtain (σ1, σ2, . . . , σm); computing the mean value of the variances
deleting me values in (σ1, σ2, . . . , σm) less than
Step 3: clustering the preprocessed time sequence data set, computing a central point data set of the cluster, and generating a working condition data set and a working condition process data set, and comprising the following specific steps:
firstly, clustering the preprocessed time sequence data sets, and neglecting the time labels during clustering, i.e., the time labels have no influence on the clustering result; using a k-means algorithm for clustering; input: a data set (xi1, xi2, . . . , xin) after dimension reduction, and the range [Kmin, Kmax] of k values needs to be determined according to expert knowledge; process: conducting k-means clustering on the data set (xi1, xi2, . . . , xin) after dimension reduction for each k value, and solving the sum of squared errors (SSE) value in clusters for each clustering result; output: using cluster partitions C=(C1, C2, . . . , Ck) when min(SSE) is taken, wherein C1, C2, . . . , CK represent a set of clusters, and K represents the number of partitioned clusters, i.e., the number of working condition types.
Then, marking the cluster partitions (C1, C2, . . . , CK) of the data set (xi1, xi2. . . , xin) with the working condition types according to the expert knowledge to form a working condition data set expressed as (xi1, xi2, . . . , xin, yk); and simultaneously, respectively computing the central points of the cluster partitions to form a central point data set (ck1, ck2, . . . , ckn, yk), wherein y represents the working condition types and the number of y is the same as the number of the cluster partitions, i.e., k≤K; c represents parameters corresponding to the working condition data set (xi1, xi2, . . . , xin, yk).
Next, computing a distance from each data in a cluster to a central node in the cluster, and taking a maximum distance value Dmax.
Finally, adding the time sequence labels for the working condition data set by taking the time sequence data set as a reference, to form a working condition process data set expressed as (ti, xi1, xi2, . . . , xin, yk), wherein y represents the working condition types and the number of y is the same as the number of the cluster partitions, i.e., k≤K; ti represents the time sequence labels which are gradually increased.
Step 4: counting a working condition transition probability for the working condition process data set to form a working condition transition probability model data set. counting a working condition transition probability for the working condition process data set (ti, xi1, xi2, . . . xin, yk) in the step 3 according to the size of a sliding window M; representing the formed working condition transition probability model data set as P(ya
K is the number of the working condition types; 1≤a1, a2, a3, aM, aM+1≤n; and n represents the number of the parameters after dimension reduction.
Step 5: continuing to collect the data after the model is built, and correcting an original model; collecting the data and taking n-dimensional parameters as input data (x′1, x′2, . . . , x′n), wherein n represents the number of the parameters after dimension reduction, and the parameters are the same as the parameters selected in the data set (xi1, xi2, . . . , xin) after dimension reduction; computing a distance from the input data to the central point data set, and taking a minimum value d of the distance; if d≤Dmax, taking the working condition type of the central point with a distance of d; adding the time sequence labels to form time sequence data (t′, x′1, x′2, . . . , x′n, y′); and saving the data into a data set to be processed; d>Dmax indicating that the input data is not matched with any working condition type; and modifying the working condition data set and the central point data set, wherein Dmax represents the maximum value of the distance from each data in the cluster to the central node in the cluster.
(1) The process of modifying the working condition data set is as follows:
directly adding the data (x′1, x′2, . . . , x′n, y′) to the working condition data set
(2) The process of modifying the central point data set is as follows:
directly adding the data (x′1, x′2, . . . , x′n, y′) to the central point data set (ck1, ck2, . . . , ckn, yk).
Step 6: computing a working condition state transition mode phase by phase and processing. The working condition transition mode is defined as ya
The process of correcting the working condition transition probability model is specifically as follows:
(1) When p=0, it indicates that the working condition transition mode appears for the first time.
The working condition transition modes to be added are assumed as ya1, ya2, ya3 . . . ya4, yaM+1.
Probability values P(ya
(2) When 0≤p≤ϵ, it indicates that the appearance probability of the working condition transition mode is very low. The working condition transition modes to be modified are assumed as ya
The probability P(ya
wherein ∈ represents a probability value defined according to expert knowledge, and ∈<68 .
Number | Date | Country | Kind |
---|---|---|---|
201811541159.9 | Dec 2018 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/075663 | 2/21/2019 | WO | 00 |