Information
-
Patent Application
-
20230297597
-
Publication Number
20230297597
-
Date Filed
October 13, 20222 years ago
-
Date Published
September 21, 2023a year ago
-
Inventors
-
Original Assignees
- Guangdong University of Petrochemical Technology
-
CPC
- G06F16/285
- G06F16/2462
- G06F16/2465
-
-
International Classifications
Abstract
Disclosed is an autonomous mining method of industrial big data based on model sets, which comprises the following steps: S1, building model sets and a mining engine based on domain knowledge and structural characteristics of multi-source heterogeneous data; S2, carrying out data sampling on the multi-source heterogeneous data, and counting the fault-tolerant estimation of random error variance; S3, mining data sets by using the mining engine, and determining the optimal fault-tolerant model of each sampled data sequence and the optimal fault-tolerant estimation of model parameters; S4, performing goodness-of-fit statistics calculation and VV&A test by using the optimal fault-tolerant model; S5, acquiring data model representation and connotation knowledge based on model clustering. The method can realize the automation of the mining process of big data, the integration of associated knowledge, the expansion of model sets, the integration of mining and modeling and the optimization of mining results.
Claims
- 1. An autonomous mining method of industrial big data based on model sets, comprising following steps:
S1, constructing model sets by a data structure analysis and a characteristic analysis based on domain knowledge; constructing a mining engine by a modal decomposition of time series data based on structural characteristics of multi-source heterogeneous data;S2, sampling the multi-source heterogeneous data, and performing fault-tolerant estimation on random error variance of sampled data, wherein the sampled data is a time series data sequence automatically extracted from an engineering data warehouse;S3, mining a data set by using the mining engine, automatically extracting the time series data sequence from the engineering data warehouse, and taking a time series data change process as a superposition of three modes: subject change component, random disturbance component and abnormal change component by adopting a multi-modal additive hypothesis; adopting a fault-tolerant fitting method of a subject component curve to eliminate influence of outlier abnormal change component and realize a fault-tolerant estimation of error variance of the random disturbance component in the sampled data; substituting a φ-function into the mining engine to determine an optimal fault-tolerant model of each sampled data sequence and an optimal fault-tolerant estimation of model parameters;S4, performing a goodness of fit statistics calculation and a VV&A test by using the optimal fault-tolerant model; andS5, clustering the model according to structures to obtain a clustering of the model, and clustering the data according to the model based on the clustering of the model to obtain data model representation and connotation knowledge.
- 2. The autonomous mining method of industrial big data based on model sets according to claim 1, wherein the model sets comprise time series analysis model class, regression analysis model class, time-varying curve fitting model class and batch process model class with fault data.
- 3. The autonomous mining method of industrial big data based on model sets according to claim 2, wherein the mining engine adopts a fault-tolerant mining engine.
- 4. The autonomous mining method of industrial big data based on model sets according to claim 3, wherein a construction method of the fault-tolerant mining engine is as follows:
selecting and combine one of four model classes with the data set in a data cluster to build a least square mining engine; andtaking a heavily attenuated integral function as a loss function instead of a least square integral function in the least square mining engine to obtain a fault-tolerant mining engine.
- 5. The autonomous mining method of industrial big data based on model sets according to claim 1, wherein the VV&A test comprises: checking the optimal fault-tolerant model mined to confirm rationality of the model sets used in a mining process; then, investigating consistency between expressed knowledge and data of the optimal fault-tolerant model, and testing the goodness of fit of associated data set; finally, the process of knowledge validation of the mining model: identifying the mining models and knowledge through model validation.
- 6. The autonomous mining method of industrial big data based on model sets according to claim 2, wherein the time series analysis model class comprises autoregressive model, moving average model, autoregressive moving average model, periodic autoregressive moving average model and controlled autoregressive model.
- 7. The autonomous mining method of industrial big data based on model sets according to claim 2, wherein the regression analysis model class comprises linear regression model, nonlinear regression model and Logistic model.
- 8. The autonomous mining method of industrial big data based on model sets according to claim 2, wherein the time-varying curve fitting model class comprises polynomial fitting model, triangular polynomial fitting model and periodic progressive model.
- 9. The autonomous mining method of industrial big data based on model sets according to claim 1, wherein the S2 comprises:
automatically extracting the time series data sequence from the multi-source heterogeneous data;taking a time series data change process as a superposition of three modes: subject change component, random disturbance component and abnormal change component by adopting a multi-modal additive hypothesis; andeliminating influence of abnormal change component by using the fault-tolerant fitting method of subject component curve, so as to obtain the fault-tolerant estimation of error variance of the random disturbance component in the sampled data.
Priority Claims (1)
Number |
Date |
Country |
Kind |
CN202111168737.0 |
Sep 2021 |
CN |
national |