TECHNICAL FIELD
The present disclosure relates to the technical field of soft sensors for dioxin (DXN) emission, in particular to a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process.
BACKGROUND
Municipal solid waste incineration (MSWI), a global solution to “waste besieging cities” at present, features safe disposal, waste reduction and resource recycling. MSWI will emit dioxin (DXN) and is required to minimize such emission since DXN is a persistent and highly toxic organic pollutant organized waste gas, and causes the phenomenon of “not-in-my-back-yard (NIMBY)” in the case of constructing incineration plants. An off-line assay and analysis method based on high resolution gas chromatography/high-resolution mass spectrometry (HRGC/HRMS) is a current central tool for measuring a DXN emission concentration. The method has the downside of great technical difficulty, long time lag and high labor and economic cost, and becomes one key factor that hinders real-time optimal control over the MSWI process in turn. To this end, on-line measurement of the DXN emission concentration has become a foremost challenge during MSWI.
Attention has been focused on an online indirect measurement method for a DXN concentration by constructing a correlation model with DXN related substances that can be measured online for solving the above problems. However, the method has some drawbacks, such as apparatus complexity, high costs, susceptibility to interference, and unpredictable prediction accuracy as a measurement method combining data modeling in essence. Compared with off-line analysis and on-line indirect measurement methods, soft sensor technology driven by easy-to-measure process data collected through an industrial distributed control system can effectively solves the problem that the DXN concentration cannot be measured online. With stability, accuracy and quick responses, the soft sensor technology has been extensively applied to difficult-to-measure parameters in complex industrial processes of petroleum, chemical industry, steelmaking, etc.
SUMMARY
An objective of the present disclosure is to provide a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process. A soft sensor modeling algorithm based on broad hybrid forest regression (BHFR) is provided with measurement of a DXN emission concentration in the MSWI process as the target.
To achieve the above objective, the present disclosure provides the following technical solutions.
A broad hybrid forest regression (BHFR)-based soft sensor method for dioxin (DXN) emission in a municipal solid waste incineration (MSWI) process includes: based on a broad learning system (BLS) framework, constructing a BHFR soft sensor model for small sample high-dimensional data by replacing a neuron with a non-differential base learner, where the BHFR soft sensor model includes a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer, and the method specifically includes:
- S1, constructing the feature mapping layer, and mapping a high-dimensional feature by constructing a hybrid forest group composed of a random forest (RF) and a completely random forest (CRF);
- S2, constructing the latent feature extraction layer, extracting a latent feature from a feature space of a fully connected hybrid matrix according to a contribution rate, guaranteeing maximum transmission and minimum redundancy of potential valuable information based on an information measurement criterion, and reducing model complexity and computation consumption;
- S3, constructing the feature incremental layer, and further enhancing a feature representation capacity by training the feature incremental layer based on an extracted latent feature;
- S4, constructing the incremental learning layer based on an incremental learning strategy, obtaining a weight matrix with a Moore-Penrose pseudo-inverse, and implementing high-precision modeling of the BHFR soft sensor model;
- S5, verifying the soft sensor model by using n industrial process DXN data set; and
- S6, using a soft sensor on DXN emission in the MSWI process with the soft sensor model constructed in steps S1-S5.
Further, S1 of constructing the feature mapping layer, and mapping a high-dimensional feature by constructing a hybrid forest group composed of a RF and a CRF specifically includes:
- setting original data as {X,y}, where XϵRNRaw×M represents original input data, NRaw represents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process, and is collected and stored in a distributed control system (DCS) in seconds, and yϵRNRaw×1 represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; describing a modeling process of the feature mapping layer by taking a nth hybrid forest group of feature mapping layer as an example:
- obtaining J training subsets of a hybrid forest group model by performing Bootstrap and random subspace (RSM) sampling on as follows:
- where XBootstrapn,j and yBootstrapn,j represent an input and an output of a J training subset respectively, ϕnFML(⋅) and φnFML(⋅) represent Bootstrap sampling and RSM sampling of the nth hybrid forest group in the feature mapping layer respectively, and PBootstrap represents a Bootstrap sampling probability;
- training a hybrid forest algorithm including J decision trees based on {XBootstrapn,j,yBootstrapn,j}j=1J, where a jth decision tree of the nth hybrid forest group in the feature mapping layer is expressed as follows:
where L represents a number of leaf nodes of the decision tree, I(⋅) represents an indicator function, and cl is computed through recursive splitting;
- expressing a splitting loss function ⋅i(⋅) of the decision tree in the RF as:
- where Ωi(s,v) represents a value V of a sth feature value taken as a loss function value of a splitting criterion, yL represents a truth vector of a DXN emission concentration at a left leaf node, E[yL] represents a mathematical expectation of yL, yR represents a truth vector of a DXN emission concentration at a right leaf node, E[yR] represents a mathematical expectation of yR, yLi represents a ith DXN emission concentration truth at the left leaf node, yRi represents a ith DXN emission concentration truth at the right leaf node, cL represents a predicted output of the DXN emission concentration at the left leaf node, and cR represents a predicted output of the DXN emission concentration at the right leaf node;
splitting, by minimizing Ωi(s,v), a training set (XBootstrapn,j,yBootstrapn,j) into two tree nodes as follows:
- where RLNL×M and RRNR×M represent sample sets included in a left tree node and a right tree node after division respectively, NL and NR represents a number of samples in RLNL×M and RRNR×M respectively;
- expressing predicted output values cLRF and cR of the DXN emission concentration at a current left tree node and a current right tree node as sample truth expectations as follows:
- where yL and yR represent truth vectors of the DXN emission concentration in RLNL×M and RRNR×M, and E[yL] and E[yR] represent mathematical expectations of yL and yR;
- different from the RF, expressing a splitting loss function in the CRF in a completely random selection mode as follows:
- where rand{(s,v)i}i=1NRaw×M represents that a value v of a sth feature is completely randomly selected as a split point;
- expressing predicted output values cLCRF and cRCRF of the DXN emission concentration at a left tree node and a right tree node that are randomly split as sample truth expectations as follows:
- expressing, through the above process, the nth hybrid forest group fnFML(⋅) as follows:
- where fn,RFFML(⋅) represents a nth random forest, and fn,CRFFML(⋅) represents a nth completely random forest;
- expressing a nth mapped feature Zn as follows:
- where (c1,ln,RF,c1,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a first sample of original input data from six different stages in the MSWI process, (cnRaw,ln,RF,cnRaw,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a nRawth sample of original input data from six different stages in the MSWI process, and (cNRaw,ln,RF,cNRaw,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a NRawth sample of original input data from six different stages in the MSWI process; and
- expressing an output of the feature mapping layer as:
- where Z11 represents a first mapped feature, Z2 represents a second mapped feature, ZN represents a Nth mapped feature, and a mapped feature matrix ZN includes NRaw samples and a 2N dimensional feature.
Further, S2 of constructing the latent feature extraction layer, extracting a latent feature from a feature space of a fully connected hybrid matrix according to a contribution rate, guaranteeing maximum transmission and minimum redundancy of potential valuable information based on an information measurement criterion, and reducing model complexity and computation consumption specifically includes:
- obtaining and expressing a fully connected hybrid matrix A by combining original input data X from six different stages in the MSWI process with a mapped feature matrix ZN as follows:
- where A includes NRaw samples and a (M+2N) dimensional feature;
- considering that a dimension of A is much higher than a dimension of the original data, minimizing redundant information in A through principal component analysis (PCA), and computing a correlation matrix R of A as follows:
- obtaining (M+2N) feature values and corresponding feature vectors by performing singular value decomposition on R as follows:
- where U(M+2N) represents a (M+2N) order orthogonal matrix, Σ(M+2N) represents a (M+2N) order diagonal matrix, and V(M+2N) represents a (M+2N) order orthogonal matrix;
- where σ1>σ2> . . . >σ(M+2) represents eigenvalue values arranged in a descending order;
- determining a number of final principal components according to a set latent feature contribution threshold n,
- where a number of latent features is QPCA«(M+2N);
- based on QPCA determined latent features above, obtaining a feature vector matrix VQPCA (that is, a projection matrix of A) corresponding to an eigenvalue value set {σq}q=1QPCA, minimizing redundant information by performing feature projection on A, and recording an obtained latent feature as XPCA, that is,
- where VQPCAϵR(M+2N)×QPCA represents a eigenvector of front QPCA latent features;
- computing a mutual information value I″ between a selected latent feature XPCA and a truth yϵRNRaw×1 as follows:
- where p(xqPCA, y) represents a joint probability distribution of a qth qth latent feature xqPCA and a DXN emission concentration truth y, p(xqPCA) represents a marginal probability PCA distribution of the qth latent feature xqPCA, and p(y) represents a marginal probability distribution of the DXN emission concentration truth y;
- guaranteeing a correlation between the selected latent feature and the truth through an information maximization selection mechanism, and expressing the correlation as:
- where {IqMI}q=1Q represents a mutual information value between QPCA latent features xqPCA and the truth y, ζ represents an information maximization threshold, and
represents QPCAMI latent features having a greatest correlation with information of the DXN emission concentration truth y; and
- obtaining a new data set {X′, y}ϵRNRaw×(QPCAMI+1) including QPCAMI latent features, and setting a dimension MPCAMI=QPCAMI after extraction.
Further, S3 of constructing the feature incremental layer, and further enhancing a feature representation capacity by training the feature incremental layer based on an extracted latent feature specifically includes:
- obtaining a J training subset of a hybrid forest algorithm by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} as follows:
- where X′Bootstrapk,j and yBootstrapk,j represent an input and an output of the jth training subset, X′ and y represent an input and an output of a new training set respectively, ϕkFEL(⋅) represents Bootstrap sampling on a kth hybrid forest group, and φkFEL(⋅) represents RSM sampling on the kth hybrid forest group;
- taking construction of a jth RF in the kth hybrid forest group as an example as follows:
- where fk,jDT-RF(⋅) represents a jth decision tree of the RF in the kth hybrid forest group in the feature incremental layer; L represents a number of leaf nodes of the decision tree; and cl is computed through recursive splitting with specific process formulas of (3)-(5);
- obtaining an RF model in the kth hybrid forest group in the feature incremental layer and expressing the RF model as follows:
- similarly taking construction of a jth CRF in the kth hybrid forest group as an example as follows:
- where fk,jDT-CRF(⋅) represents a jth decision tree of the CRF in the kth hybrid forest group in the feature incremental layer; and cl is computed through recursive splitting with specific process formulas of (6)-(7);
- obtaining a CRF model in the kth hybrid forest group in the feature incremental layer and expressing the CRF model as follows:
- obtaining the kth hybrid forest group fkFEL(⋅) through the above process, and expressing a kth enhanced feature as follows:
- where (c1,lk,RF, c1,lk,RF) represents enhanced mapping on a first sample in new data through the kth hybrid forest group, (cnRaw,lk,RF,cnRaw,lk,RF) represents enhanced mapping on a nRawth sample in the new data through the kth hybrid forest group, and (cNRaw,lk,RF,cNRaw,lk,RF) represents enhanced mapping on a NRawth sample in the new data through the kth hybrid forest group;
- expressing an output HK of the feature incremental layer as follows:
- where H1 represents a first enhanced feature, H2 represents a second enhanced feature, and HK represents a K enhanced feature;
- expressing a BHFR model without considering the incremental learning strategy as follows:
- where GK represents a combination of outputs from the feature mapping layer and the feature incremental layer, that is GK=[ZN|HK], and includes NRaw samples and a (2N+2K) dimensional feature; and WK represents a weight of each of the feature mapping layer and the feature incremental layer relative to the output layer, and is computed as follows:
- where I represents a unit matrix, and represents a regularization term coefficient; and accordingly, expressing a pseudo-inverse computation of GK as:
Further, S4 of constructing the incremental learning layer based on an incremental learning strategy, obtaining a weight matrix with Moore-Penrose pseudo-inverse, and implementing high-precision modeling of the BHFR soft sensor model specifically includes:
- obtaining a training subset of the hybrid forest algorithm by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} in a process as follows:
- where X′Bootstrapp,j and yBootstrapp,j represent the input and the output of the J training subset of the hybrid forest algorithm, X′ and y represent the input and the output of the new training set respectively, and ϕiILL(⋅) and φpILL(⋅) represent Bootstrap sampling and sampling of a pth hybrid forest group in the incremental learning layer;
- construction decision trees fp,RFILL(⋅) and fp,CRFILL(⋅) in the pth hybrid forest group in the same process (not repeated herein) as the feature mapping layer and the feature incremental layer;
- under the condition that one hybrid forest group is added, expressing outputs GK+1 of the feature mapping layer, the feature incremental layer and the incremental learning layer as follows:
- where Gk=[Zn|Hk] includes NRaw samples and a (2N+2K) dimensional feature, and GK+1 includes NRaw samples and a (2N+2K+2J) dimensional feature;
- recursively updating a Moore-Penrose inverse matrix of GK+1 as follows:
- where a matrix C and a matrix D are computed as follows:
- expressing a recursive formula of the Moore-Penrose inverse matrix of GK+1 as follows:
- computing an updating matrix WK+1 of a weight of the feature mapping layer, the feature incremental layer, the incremental learning layer relative to the output layer as follows:
- implementing rapid incremental learning due to the fact that a pseudo-inverse matrix of the hybrid forest group in the incremental learning layer is merely required to be computed according to a pseudo-inverse updating strategy above;
- implementing adaptive incremental learning according to a convergence degree of a training error;
- determining a number p of hybrid forest groups of incremental learning by defining a convergence threshold of the error as θCon, and expressing accordingly an incremental learning training error of the BHFR model as follows:
- where represent a training error value between a p+1 hybrid forest group and a p hybrid forest group of incremental learning, and √{square root over ((GK+pWK+p−y)2)} and √{square root over ((GK+p+1WK+p+1−y)2)} represent training errors of a BHFR model including p hybrid forest groups and a BHFR model including p+1 hybrid forest groups; and
- expressing a predicted output Ŷ of the BHFR soft sensor model as follows:
According to a specific embodiment of the present disclosure, the present disclosure has the following technical effects: the BHFR-based soft sensor method for DXN emission in an MSWI process according to the present disclosure constructs a soft sensor model based on the BHFR, and the soft sensor model combines algorithms such as width learning modeling, integrated learning and latent feature extraction: 1) based on a broad learning system (BLS) framework, a non-differential learner is used to construct the soft sensor model including a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer; 2) internal information of the BHFR model is processed through full information connection, latent feature extraction and mutual information measurement, so as to effectively ensure maximum transmission and minimum redundancy of internal feature information of the BHFR model; 3) a hybrid forest group is used as a mapping unit for incremental learning in a modeling process, and a weight matrix of an output layer is rapidly computed according to a pseudo-inverse strategy, and then the incremental learning is adaptively adjusted based on a convergence degree of a training error, thereby implementing high-precision soft sensor modeling. The validity and rationality of the method are verified by using an industrial process dioxin (DXN) data set.
BRIEF DESCRIPTION OF THE DRAWINGS
To describe the technical solutions in embodiments of the present disclosure or in the prior art more clearly, the accompanying drawings required in the embodiments are briefly described below. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and other drawings can be derived from these accompanying drawings by those of ordinary skill in the art without creative efforts.
FIG. 1 is a flowchart of a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process according to an embodiment of the present disclosure;
FIG. 2 is a technical flowchart of an MSWI process according to an embodiment of the present disclosure;
FIG. 3 is a training error convergence curve according to an embodiment of the present disclosure;
FIG. 4A is a fitting curve of a training set in a DXN data set according to an embodiment of the present disclosure;
FIG. 4B is a fitting curve of a verification set in a DXN data set according to an embodiment of the present disclosure; and
FIG. 4C is a fitting curve of a testing set in a DXN data set according to an embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
The technical solutions of embodiments of the present disclosure will be clearly and completely described below with reference to accompanying drawings. Apparently, the described embodiments are merely some embodiments rather than all embodiments of the present disclosure. All other embodiments derived by a person of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.
An objective of the present disclosure is to provide a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process. A soft sensor modeling algorithm based on broad hybrid forest regression (BHFR) is provided with measurement of a dioxin (DXN) emission concentration in the MSWI process as the target.
To make the above objectives, features, and advantages of the present disclosure clearer and more comprehensible, the present disclosure will be further described below in detail with reference to the accompanying drawings and specific implementation modes.
As shown in FIG. 1, the broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in an MSWI process according to the present disclosure includes:
- based on a broad learning system (BLS) framework, a BHFR soft sensor model is constructed for small sample high-dimensional data by replacing a neuron with a non-differential base learner, where the BHFR soft sensor model includes a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer, and the method specifically includes:
- S1, the feature mapping layer is constructed, and a high-dimensional feature is mapped by constructing a hybrid forest group composed of a random forest (RF) and a completely random forest (CRF).
S2, the latent feature extraction layer is constructed, a latent feature is extracted from a feature space of a fully connected hybrid matrix according to a contribution rate, maximum transmission and minimum redundancy of potential valuable information are guaranteed based on an information measurement criterion, and model complexity and computation consumption are reduced.
S3, the feature incremental layer is constructed, and a feature representation capacity is further enhanced by training the feature incremental layer based on an extracted latent feature.
S4, the incremental learning layer is constructed based on an incremental learning strategy, a weight matrix is obtained with a Moore-Penrose pseudo-inverse, and high-precision modeling of the BHFR soft sensor model is implemented.
S5, the soft sensor model is verified by using an industrial process dioxin (DXN) data set.
S6, a soft sensor is used on DXN emission in the MSWI process with the soft sensor model constructed in steps S1-S5.
The MSWI process includes technical stages such as solid waste storage and transportation, solid waste incineration, waste heat boilers, steam power generation, flue gas purification and flue gas emission. A grate MSWI process with a daily capacity of 800 tons is taken as an example, and a technical flow is shown in FIG. 2.
An entire process of DXN decomposition, generation, adsorption and emission is combined to describe main functions at each stage as follows:
1) At the solid waste storage and transportation stage: sanitation vehicles transport MSW from various collection stations in a city to an MSWI power plant, and after being weighed and recorded, the MSW is dumped from an unloading platform to an unfermented area in a solid waste storage pool, then is mixed and stirred with a solid waste grab, then is grabbed to a fermentation area, and is fermented and dehydrated for 3-7 days to guarantee a low calorific value of MSW incineration. The research shows that primary MSW contains a small amount of DXN (about 0.8 ng TEQ/Kg) and a variety of chlorine-containing compounds needed for formation of DXN.
2) At the solid waste incineration stage: fermented MSW is thrown by the solid waste grab into a feed hopper, and is pushed into an incinerator through a feeder. After drying, combustion 1, combustion 2 and a burning-through grate in turn, combustible components in the MSW are completely burned, required combustion-supporting air is injected from a lower portion of the grate and a middle portion of a furnace by a primary fan and a secondary fan, and ash produced by final combustion falls from the end of a burning-through grate to a submerged chain conveyor, and is sent to a slag pool after water cooling. In order to ensure that the DXN contained in the original MSW and generated during incineration may be completely decomposed under the high-temperature combustion conditions in the furnace, during furnace combustion, a flue gas temperature is required to be controlled strictly above 850° C., and residence time of high-temperature flue gas in the furnace for more than 2 seconds, so as to guarantee sufficient flue gas turbulence.
3) At the waste heat boiler stage: the high-temperature flue gas (higher than 850° C.) generated in the furnace is pumped into a waste heat boiler system by an induced draft fan, and then passes through a superheater, an evaporator and an economizer. After heat exchange between the high-temperature flue gas and liquid water in a boiler drum, high-temperature steam is generated, such that the high-temperature flue gas is cooled, and a temperature of flue gas (that is, flue gas G1) at an outlet of the waste heat boiler is lower than 200° C. From the perspective of a formation mechanism of DXN, chemical reactions that lead to the formation of DXN when the high-temperature flue gas is cooled by the waste heat boiler include high-temperature gas phase synthesis (800° C.-500° C.), precursor synthesis (450° C.-200° C.) and de novo synthesis (350° C.-250° C.), but there is no unified conclusions yet.
4) At the steam power generation stage: the high-temperature steam generated by the waste heat boiler is used to drive a turbine generator, and mechanical energy is converted into electrical energy, so as to implement self-sufficiency of power consumption at a plant level and on-grid power supply of surplus power, resource recycling and obtain economic benefits.
5) At the flue gas purification stage: the flue gas purification in the MSWI process mainly includes a series of processes, such as denitration (NOX), desulfurization (HCL, HF, SO2, etc.), heavy metal removal (Pb, Hg, Cd, etc.), dioxin adsorption (DXN) and dust removal (particulate matter), so as to satisfy emission standards of incineration flue gas pollutants. It is the most widely used technical means to adsorb DXN in incineration flue gas with an activated carbon injection system, and adsorbed DXN is enriched in fly ash.
6) At the flue gas emission stage: incineration flue gas including trace DXN after cooling and purification (that is, flue gas G2) is sucked by the induced draft fan and emitted into the atmosphere through a chimney. The uninterrupted and long-term running of the MSWI process lead to a large quantity of DXN (that is, memory effects) attached to particles on an inner wall of the chimney. Operation condition for emission is still a research problem at present.
At present, research on the soft sensor of D×N in the MSWI process mainly focuses on measurement of the DXN concentration at an emission stage (that is, flue gas G3), and a research focus of the present application is to construct a soft sensor model at G3 flue gas.
The BHFR modeling strategy according to the present application includes the feature mapping layer, the latent feature extraction layer, the feature incremental layer and the incremental learning layer.
In FIG. 1, {X,y}ϵRNRaw×(M+1) represents original data, where XϵRNRaw×M represents original input data, NRaw represents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process above, and is collected and stored in a distributed control system (DCS) in seconds, and yϵRNRaw×1 represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; {DT1, . . . , DTJ} represents J decision tree models in a hybrid forest algorithm, DT1 represents the first decision tree model and DTJ represents the Jth decision tree model; Bootstrap and RSM represent sampling and feature sampling on input data; {RFn,CRFn} represents the nth hybrid forest group model, and RFn and CRFn represent the nth RF and CRF models; {Groupn}n=1N represents that the feature mapping layer includes N hybrid forest group models; ZN represents an output from the feature mapping layer; HK represents an output from the feature incremental layer; [X|ZN] represents a fully connected hybrid matrix consisted by the original data and ZN; X′ϵRNRaw×MPCA represents a new training data after latent feature extraction; {Groupk}k=1P represents K hybrid forest group models included in the feature incremental layer; {Groupp}p=1P represents P hybrid forest group model included in the incremental learning layer; and WK+P represents a final weight matrix.
Each part has main functions as follows:
1) The feature mapping layer: the original input data XϵRNRaw×M from six different stages in the MSWI process are subjected to feature mapping through N′ hybrid forest groups in the feature mapping layer {RFn,CRFn}n=1N to obtain a mapping output matrix ZN.
2) The latent feature extraction layer: a latent feature is extracted, through principal component analysis, from the fully connected hybrid matrix [X|Z] composed of the original input data XϵRNRaw×M and the output ZN from the feature mapping layer, redundant information is removed from the feature space, and a latent feature dimension is further determined and a new training set X′ϵRNRaw×MPM is obtained through mutual information between an extracted latent feature and an output truth y of the DXN emission concentration.
3) The feature incremental layer: the new training set X′ϵRNRaw×MPM is taken as an input, feature mapping is performed through K′ hybrid forest groups {RFk,CRFk}k=1K of the feature incremental layer, and an output matrix HK of the enhancement layer is obtained.
4) The incremental learning layer: the new training set X′ϵRNRaw×MPM is taken as an input, and a weight WK+P is gradually increased and updated with the hybrid forest group as the minimum unit until a training error converges.
Essentially, the BHFR replaces a neuron in the original BLS by taking a hybrid forest group composed of the RF and the CRF as the basic mapping unit. The S1, the feature mapping layer is constructed, and a high-dimensional feature is mapped by constructing a hybrid forest group composed of an RF and a CRF includes:
- original data are set as {X,y}, where XϵRNRaw×M represents original input data, NRaw represents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process, and is collected and stored in a distributed control system (DCS) in seconds, and JϵRNRaw×1 represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; a modeling process of the feature mapping layer is described by taking the nth hybrid forest group of feature mapping layer as an example:
- J training subsets of a hybrid forest group model are obtained by performing Bootstrap and random subspace (RSM) sampling on {X,y} as follows:
where XBootstrapn,j and yBootstrapn,j represent an input and an output of the jth training subset, ϕnFML(⋅) and PφnFML(⋅) represent Bootstrap sampling and RSM sampling of the nth hybrid forest group in the feature mapping layer, and PBootstrap represents a Bootstrap sampling probability;
- a hybrid forest algorithm including J decision trees is trained based on {XBootstrapn,j,yBootstrapn,j}j=1J, where the jth decision tree of the nth hybrid forest group in the feature mapping layer is expressed as follows:
- where L represents a number of leaf nodes of the decision tree, I(⋅) represents an indicator function, and cl is computed through recursive splitting;
- a splitting loss function Ωi(⋅) of the decision tree in the RF is expressed as:
- where Ωi(s,v) represents a value V of the sth feature value taken as a loss function value of a splitting criterion, yL represents a truth vector of a DXN emission concentration at a left leaf node, E[yL] represents a mathematical expectation of yL, yR represents a truth vector of a DXN emission concentration at a right leaf node, E[yR] represents a mathematical expectation of yR, YLi represents the ith DXN emission concentration truth at the left leaf node, yRi represents the ith DXN emission concentration truth at the right leaf node, cL represents a predicted output of the DXN emission concentration at the left leaf node, and cR represents a predicted output of the DXN emission concentration at the right leaf node;
- by minimizing Ωi(s,v), a training set (XBootstrapn,j,yBootstrapm,j) is split into two tree nodes as follows:
- where RLNL×M and RRNR×M represent sample sets included in a left tree node and a right tree node after division respectively, NL and NR represents a number of samples in RLNL×M and RRNR×M respectively;
- predicted output values cLRF and cRRF of the DXN emission concentration at a current left tree node and a current right tree node are expressed as sample truth expectations as follows:
- where yL and yR represent truth vectors of the DXN emission concentration in RLNL×M and RRNR×M, and E[yL] and E[yR] represent mathematical expectations of yL and yR;
- different from the RF, a decision tree in the CRF is split in a completely random selection mode expressed as follows:
- where rand{(s,v)i}i=1NRaw×M represents that a value v of the sth feature is completely randomly selected as a split point;
- predicted output values cLCRF and cRCRF of the DXN emission concentration at a left tree node and a right tree node that are randomly split as sample truth expectations are expressed as follows:
- through the above process, the nth hybrid forest group fnFML(⋅) is expressed as follows:
- where fn,RFFML(⋅) represents the nth random forest, and fn,CRFFML represents the nth completely random forest;
- the nth mapped feature Zn is expressed as
- where (c1,ln,RF, c1,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a first sample of original input data from six different stages in the MSWI process, (cnRaw,ln,RF,cnRaw,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of the NRawth sample of original input data from six different stages in the MSWI process, and (cNRaw,ln,RF,cNRaw,ln,RF) represents a mapped feature, obtained through the nth hybrid forest group, of the NRawth sample of original input data from six different stages in the MSWI process; and
- an output of the feature mapping layer is expressed as:
- where Z1 represents the first mapped feature, Z2 represents the second mapped feature, ZN represents the Nth mapped feature, and a mapped feature matrix ZN includes NRaw samples and a 2N dimensional feature.
In order to avoid an over-fitting phenomenon caused by an information loss during information transmission, the BHFR according to the present application uses a full connection strategy to implement information transmission between the feature mapping layer, the feature incremental layer and the incremental learning layer. Moreover, in order to minimize information redundancy during model training, the PCA is used to extract the latent feature from a feature space of the fully connected hybrid matrix, and then mutual information is used to further screen a latent feature related to maximization of truth information, so as to reduce a dimension of high-dimensional data.
S2 that the latent feature extraction layer is constructed, a latent feature is extracted from a feature space of a fully connected hybrid matrix according to a contribution rate, maximum transmission and minimum redundancy of potential valuable information are guaranteed based on an information measurement criterion, and model complexity and computation consumption are reduced specifically includes:
- a fully connected hybrid matrix A is obtained and expressed by combining original input data X from six different stages in the MSWI process with a mapped feature matrix Z as follows:
- where A includes NRaw samples and a (M+2N) dimensional feature;
- that a dimension of A is much higher than a dimension of the original data is considered, redundant information in A is minimized through principal component analysis (PCA), and a correlation matrix R of A is computed as follows:
- (M+2N) feature values and corresponding feature vectors are obtained by performing singular value decomposition on R as follows:
- where U(M+2N) represents a (M+2N) order orthogonal matrix, Σ(M+2N) represents a (M+2N) order diagonal matrix, and V(M+2N) represents a (M+2N) order orthogonal matrix;
- where σ1>σ2> . . . >σ(M+2N) represents eigenvalues arranged in a descending order;
- a number of final principal components is determined according to a set latent feature contribution threshold η,
- where a number of latent features is QPCA«(M+2N);
- based on QPCA determined latent features above, a feature vector matrix VQPCA (that is, a projection matrix of A) corresponding to a eigenvalue set {σq}q=1QPCA is obtained, redundant information is minimized by performing feature projection on A, and an obtained latent feature is recorded as XPCA, that is,
where VQPCAϵR(M+2N)×QPCA represents a eigenvector of front QPCA latent features;
- a mutual information value IMI between a selected latent feature XPCA and a truth yϵRNRaw×1 is computed as follows:
- where p(xqPCA,y) represents a joint probability distribution of the qth latent feature xqPCA and a DXN emission concentration truth y, p(xqPCA) represents a marginal probability distribution of the qth latent feature xqPCA, and p(y) represents a marginal probability distribution of the DXN emission concentration truth y;
- a correlation between the selected latent feature and the truth is guaranteed through an information maximization selection mechanism, and the correlation is expressed as:
- where {IqMI}q=1Q represents a mutual information value between QPCA latent features xqPCA and the truth y, ζ represents an information maximization threshold, and
represents QPCAMI latent features having a greatest correlation with information of the DXN emission concentration truth; and
- a new data set {X′,y}ϵRNRaw×(QPCAMI+1) including QPCAMI latent features is obtained, and a dimension MPCAMI=QPCAMI after extraction is set.
S3 that the feature incremental layer is constructed, and a feature representation capacity is further enhanced by training the feature incremental layer based on an extracted latent feature specifically includes:
- a J training subset of a hybrid forest algorithm is obtained by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} as follows:
- where X′Bootstrapk,j and yBootstratk,j represent an input and an output of the J training subset, X′ and y represent an input and an output of a new training set respectively, ϕkFEL(⋅) represents Bootstrap sampling on the kth hybrid forest group, and φkFEL(⋅) represents RSM sampling on the kth hybrid forest group;
- construction of the jth RF in the kth hybrid forest group is taken as an example as follows:
- where fk,jDT-RF(⋅) represents a jth decision tree of the RF in the kth hybrid forest group in the feature incremental layer; L represents a number of leaf nodes of the decision tree; and cl is computed through recursive splitting with specific process formulas of (3)-(5);
- an RF model in the kth hybrid forest group in the feature incremental layer is obtained and the RF model is expressed as follows:
- construction of the jth CRF in the kth hybrid forest group is taken as an example similarly as follows:
- where fk,jDT-CRF(⋅) represents the jth decision tree of the CRF in the kth hybrid forest group in the feature incremental layer; and cl is computed through recursive splitting with specific process formulas of (6)-(7);
- a CRF model in the kth hybrid forest group in the feature incremental layer is obtained and the CRF model is expressed as follows:
- the kth hybrid forest group fkFEL(⋅) through the above process is obtained, and the kth enhanced feature is expressed as follows:
where (c1,lk,RF, c1,lk,RF) represents enhanced mapping on a first sample in new data through the kth hybrid forest group, (cnRaw,lk,RF,cnRaw,lk,RF) represents enhanced mapping on the nRawth sample in the new data through the kth hybrid forest group, and (cNRaw,lk,RF,cNRaw,lk,RF) represents enhanced mapping on a NRawth sample in the new data through the kth hybrid forest group;
- an output HK of the feature incremental layer is expressed as follows:
- where H1 represents a first enhanced feature, H2 represents a second enhanced feature, and HK represents the Kth enhanced feature;
- a BHFR model is expressed without considering the incremental learning strategy as follows:
- where GK represents a combination of outputs from the feature mapping layer and the feature incremental layer, that is GK=[ZN|HK], and includes NRaw samples and a (2N+2K) dimensional feature; and WK represents a weight of each of the feature mapping layer and the feature incremental layer relative to the output layer, and is computed as follows:
- where I represents a unit matrix, and λ represents a regularization term coefficient; and a pseudo-inverse computation of G is expressed accordingly as:
The BHFR according to the present application takes the hybrid forest group as a basic unit to implement incremental learning according to the convergence degree of a training error. S4 that the incremental learning layer is constructed based on an incremental learning strategy, a weight matrix is obtained with a Moore-Penrose pseudo-inverse, and high-precision modeling of the BHFR soft sensor model is implemented specifically includes:
- a training subset of the hybrid forest algorithm is obtained by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} in a process as follows:
- where X′Bootstrapp,j and yBootstrapp,j represent the input and the output of the jth training subset of the hybrid forest algorithm, X′ and y represent the input and the output of the new training set respectively, and ϕiILL(⋅) and φpILL(⋅) represent Bootstrap sampling and sampling of a pth hybrid forest group in the incremental learning layer;
- decision trees fp,RFILL(⋅) and fp,CRFILL(⋅) in the pth hybrid forest group are constructed in the same process (not repeated herein) as the feature mapping layer and the feature incremental layer;
- under the condition that one hybrid forest group is added, outputs GK+1 of the feature mapping layer, the feature incremental layer and the incremental learning layer are expressed as follows:
- where Gk=[Zn|Hk] includes NRaw samples and a (2N+2K) dimensional feature, and GK+1 includes NRaw samples and a (2N+2K+2J) dimensional feature;
- a Moore-Penrose inverse matrix of GK+1 is updated recursively as follows:
- where a matrix C and a matrix D are computed as follows:
- a recursive formula of the Moore-Penrose inverse matrix of GK+1 is expressed as follows:
- an updating matrix WK+1 of a weight of the feature mapping layer, the feature incremental layer, the incremental learning layer relative to the output layer is computed as follows:
- rapid incremental learning is implemented due to the fact that a pseudo-inverse matrix of the hybrid forest group in the incremental learning layer is merely required to be computed according to a pseudo-inverse updating strategy above;
- adaptive incremental learning is implemented according to a convergence degree of a training error;
a number P of hybrid forest groups of incremental learning is determined by defining a convergence threshold of the error as θCon, and an incremental learning training error of the BHFR model is expressed accordingly as follows:
- where represent a training error value between a p+1 hybrid forest group and a p hybrid forest group of incremental learning, and √{square root over ((GK+pWK+p−y)2)} and √{square root over ((GK+pWK+p+1−y)2)} represent training errors of a BHFR model including p hybrid forest groups and a BHFR model including p+1 hybrid forest groups; and
a predicted output Ŷ of the BHFR soft sensor model is expressed as follows:
The present application uses actual DXN data of a MSWI power plant for industrial verification. The DXN data are sourced from a MSWI incineration power plant in Beijing, and include 141 sets of DXN emission concentration modeling data from 2009 to 2020. A DXN truth is a converted concentration after 2-hour sampling and testing, and an input variable with missed data and abnormal variables removed is 116 dimensions, and a corresponding value is a mean in a current sampling period of DXN truth.
In the present application, a root mean square error (RMSE), a mean absolute error (MAE) and a coefficient of determination (R2) are selected to compare the performance of different methods with computation as follows:
- where N represents a number of data, yi represents i truth, ŷi represents i predicted value and y represents a mean value.
In the DXN data set, parameters of the BHFR method are set as follows: a minimum sample number Nsamples of leaf nodes of a decision tree is 7, a number of RSM feature selection is √{square root over (Nfeatures)} a number Ntree of decision trees is 10, a number NForest of hybrid forest groups in each of the feature mapping layer and the feature incremental layer is 10, a latent feature contribution rate threshold η is 0.9, and a regularization parameter λ is 2-10.
A number of latent features for the feature incremental layer and the incremental learning layer is determined based on the fully connected hybrid matrix and a feature space A. A feature dimension of A in a DXN data set is 316. When the latent feature contribution rate threshold η is 0.9, a number of latent features selected from the DXN data set is 35 respectively. Then, mutual information values between 35 latent features and the DXN truth are computed. A mutual information threshold π is set as 0.75, and a number of latent features selected from the DXN data set is 6.
Further, a number of hybrid forest group units in the incremental learning layer is preset to be 1000, and accordingly a relation between the training error of the BHFR model and a number of hybrid forest groups is shown in FIG. 3.
According to a training error curve shown in FIG. 3, a training process of the BHFR on the DXN data set may converge to a certain lower limit.
Then, RF, DFR, DFR-clfc and BLS-NN are used to compare with the provided BHFR, and parameters are set as follows: (1) in the case of the RF, a minimum sample number Nsmples of leaf nodes of a decision tree is 3, a number of RSM feature selection is √{square root over (Nfeatures)}, and a number Ntree of decision trees is 500; (2) in the case of DFR, a minimum sample number Nsmples of leaf nodes of a decision mode is 3, a number of RSM feature selection is √{square root over (Nfeatures)}, a number Ntree of decision trees is 500, a number NRF of RF models and a number NCRF of CRF models in each layer are 2, and a total number of layers is set as 50; (3) in the case of DFR-clfc, a minimum sample number Nsmples of leaf nodes of a decision tree is 3, a number of RSM feature selection is √{square root over (Nfeatures)}, a number Ntree of decision trees is 500, a number NRF of RF models and a number NCRF of CRF models in each layer are 2, and a total number of layers is set as 50; and (4) in the case of BLS-NN, a number Nm of feature nodes is 5, a number Ne of enhanced nodes is 41, a number Nn of neurons is 9 and the regularization parameter λ is 2{circumflex over ( )}30. The above method is repeated for 20 times under the same conditions, and statistical results and prediction curves are shown in Table 1 and FIGS. 4A-4C.
TABLE 1
|
|
Experiment results of the DXN data set
|
RMSE
MAE
R2
|
Method
Data set
Mean
Variance
Mean
Variance
Mean
Variance
|
|
RF
Training set
1.1159E−02
5.7497E−08
9.0221E−03
4.0684E−08
8.5346E−01
3.9360E−05
|
Verification set
2.0051E−02
1.8026E−07
1.4677E−02
8.2255E−08
5.0196E−01
4.3515E−04
|
Testing set
1.6922E−02
1.6150E−07
1.3548E−02
8.9520E−08
5.9001E−01
3.7817E−04
|
DFR
Training set
1.1493E−02
8.7413E−09
9.4568E−03
4.6626E−09
8.4463E−01
6.3663E−06
|
Verification set
2.0735E−02
9.7835E−09
1.5780E−02
1.1121E−08
4.6759E−01
2.5813E−05
|
Testing set
1.7791E−02
1.7308E−08
1.4608E−02
1.5235E−08
5.4701E−01
4.5066E−05
|
DFR-clfc
Training set
8.0852E−03
2.9078E−06
6.6040E−03
2.0819E−06
9.1986E−01
1.1887E−03
|
Verification set
2.0187E−02
1.4562E−07
1.5626E−02
2.3355E−08
4.9520E−01
3.6404E−04
|
Testing set
1.7025E−02
1.5755E−07
1.4068E−02
6.0233E−08
5.8501E−01
3.7843E−04
|
BLS-NN
Training set
1.2924E−09
1.5756E−18
9.5358E−10
7.2150E−19
1.0000E+00
8.2358E−29
|
Verification set
6.8845E−02
7.0040E−04
5.3153E−02
3.3474E−04
−5.6928E+00
3.7799E+01
|
Testing set
7.8396E−02
6.7692E−04
6.0922E−02
4.1785E−04
−8.7153E+00
4.7630E+01
|
BHFR
Training set
6.0665E−03
1.6330E−08
3.9665E−03
8.4708E−09
9.5669E−01
3.3481E−06
|
Verification set
2.1551E−02
3.5181E−08
1.2384E−02
3.5083E−08
4.2484E−01
9.8731E−05
|
Testing set
1.6189E−02
2.2474E−08
1.1226E−02
1.0102E−08
6.2491E−01
4.8607E−05
|
|
It may be seen from Table 1 and FIGS. 4A-4C that: 1) RF is superior to DFR in mean statistical results of RMSE, MAE and R2 during training, verification and testing, but inferior to DFR in stability. 2) DFR and DFR-clfc are close to RF in modeling accuracy, and better than RF in modeling stability, DFR-clfc is slightly higher than DFR in accuracy in the training set, the verification set and the testing set, but is inferior to DFR in stability. 3) BLS-NN has obvious over-fitting on the training data, and is worse in generalization performance and stability in the verification set and the testing set, indicating that BLS-NN is difficult to apply to small sample high-dimensional data in a real industrial process in the present application. 4) The BHFR is the best in mean statistical results of RMSE, MAE and R2 indicators in the testing set, and is next to DFR in stability, indicating that BHFR has desirable generalization performance and stability.
To sum up, the DXN soft sensor modeling experiment shows that the BHFR according to the present application is better than classic RF, DFR and improved version DFR-clfc in training and learning capacity, is superior to RF, DFR, DFR-clfc and BLS-NN in modeling accuracy and a data fitting degree in the testing set, and shows obvious advantages in constructing the DXN soft sensor model.
The broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in an MSWI process according to the present disclosure constructs the soft sensor model based on the BHFR, and the soft sensor model combines algorithms such as broad learning modeling, ensemble learning and latent feature extraction: 1) based on the BLS framework, the non-differential learner is used to construct the soft sensor model including the feature mapping layer, the latent feature extraction layer, the feature incremental layer and the incremental learning layer; 2) internal information of the BHFR model is processed through full information connection, latent feature extraction and mutual information measurement, so as to effectively ensure maximum transmission and minimum redundancy of internal feature information of the BHFR model; 3) the hybrid forest group is used as a mapping unit for incremental learning in a modeling process, and the weight matrix of the output layer is rapidly computed according to the pseudo-inverse strategy, and then the incremental learning is adaptively adjusted based on the convergence degree of the training error, thereby implementing high-precision soft sensor modeling. The validity and rationality of the method are verified by using a high-dimensional an industrial process DXN data set.
Specific examples are used herein to explain the principles and implementation modes of the present disclosure. The foregoing description of the embodiments is merely intended to help understand the method of the present disclosure and its core ideas. Besides, various modifications may be made by a person of ordinary skill in the art to specific embodiments and the scope of application in accordance with the ideas of the present disclosure. In conclusion, the content of the present specification shall not be construed as limitation to the present disclosure.