BROAD HYBRID FOREST REGRESSION (BHFR)-BASED SOFT SENSOR METHOD FOR DIOXIN (DXN) EMISSION IN MUNICIPAL SOLID WASTE INCINERATION (MSWI) PROCESS

TECHNICAL FIELD

The present disclosure relates to the technical field of soft sensors for dioxin (DXN) emission, in particular to a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process.

BACKGROUND

Municipal solid waste incineration (MSWI), a global solution to “waste besieging cities” at present, features safe disposal, waste reduction and resource recycling. MSWI will emit dioxin (DXN) and is required to minimize such emission since DXN is a persistent and highly toxic organic pollutant organized waste gas, and causes the phenomenon of “not-in-my-back-yard (NIMBY)” in the case of constructing incineration plants. An off-line assay and analysis method based on high resolution gas chromatography/high-resolution mass spectrometry (HRGC/HRMS) is a current central tool for measuring a DXN emission concentration. The method has the downside of great technical difficulty, long time lag and high labor and economic cost, and becomes one key factor that hinders real-time optimal control over the MSWI process in turn. To this end, on-line measurement of the DXN emission concentration has become a foremost challenge during MSWI.

Attention has been focused on an online indirect measurement method for a DXN concentration by constructing a correlation model with DXN related substances that can be measured online for solving the above problems. However, the method has some drawbacks, such as apparatus complexity, high costs, susceptibility to interference, and unpredictable prediction accuracy as a measurement method combining data modeling in essence. Compared with off-line analysis and on-line indirect measurement methods, soft sensor technology driven by easy-to-measure process data collected through an industrial distributed control system can effectively solves the problem that the DXN concentration cannot be measured online. With stability, accuracy and quick responses, the soft sensor technology has been extensively applied to difficult-to-measure parameters in complex industrial processes of petroleum, chemical industry, steelmaking, etc.

SUMMARY

An objective of the present disclosure is to provide a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process. A soft sensor modeling algorithm based on broad hybrid forest regression (BHFR) is provided with measurement of a DXN emission concentration in the MSWI process as the target.

To achieve the above objective, the present disclosure provides the following technical solutions.

A broad hybrid forest regression (BHFR)-based soft sensor method for dioxin (DXN) emission in a municipal solid waste incineration (MSWI) process includes: based on a broad learning system (BLS) framework, constructing a BHFR soft sensor model for small sample high-dimensional data by replacing a neuron with a non-differential base learner, where the BHFR soft sensor model includes a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer, and the method specifically includes:

- S1, constructing the feature mapping layer, and mapping a high-dimensional feature by constructing a hybrid forest group composed of a random forest (RF) and a completely random forest (CRF);
- S2, constructing the latent feature extraction layer, extracting a latent feature from a feature space of a fully connected hybrid matrix according to a contribution rate, guaranteeing maximum transmission and minimum redundancy of potential valuable information based on an information measurement criterion, and reducing model complexity and computation consumption;
- S3, constructing the feature incremental layer, and further enhancing a feature representation capacity by training the feature incremental layer based on an extracted latent feature;
- S4, constructing the incremental learning layer based on an incremental learning strategy, obtaining a weight matrix with a Moore-Penrose pseudo-inverse, and implementing high-precision modeling of the BHFR soft sensor model;
- S5, verifying the soft sensor model by using n industrial process DXN data set; and
- S6, using a soft sensor on DXN emission in the MSWI process with the soft sensor model constructed in steps S1-S5.

Further, S1 of constructing the feature mapping layer, and mapping a high-dimensional feature by constructing a hybrid forest group composed of a RF and a CRF specifically includes:

- setting original data as {X,y}, where XϵR^N^Raw^×Mrepresents original input data, N_Rawrepresents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process, and is collected and stored in a distributed control system (DCS) in seconds, and yϵR^N^Raw^×1represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; describing a modeling process of the feature mapping layer by taking a nth hybrid forest group of feature mapping layer as an example:
- obtaining J training subsets of a hybrid forest group model by performing Bootstrap and random subspace (RSM) sampling on as follows:

$\begin{matrix} {X_{Bootstrap}^{n, j}, y_{Bootstrap}^{n, j}}_{j = 1}^{J} = φ_{n}^{FML} (ϕ_{n}^{FML} ((X, y), P_{Bootstrap})) & (1) \end{matrix}$

- where X_Bootstrap^n,jand y_Bootstrap^n,jrepresent an input and an output of a J training subset respectively, ϕ_n^FML(⋅) and φ_n^FML(⋅) represent Bootstrap sampling and RSM sampling of the nth hybrid forest group in the feature mapping layer respectively, and P_Bootstraprepresents a Bootstrap sampling probability;
- training a hybrid forest algorithm including J decision trees based on {X_Bootstrap^n,j,y_Bootstrap^n,j}_j=1^J, where a jth decision tree of the nth hybrid forest group in the feature mapping layer is expressed as follows:

$\begin{matrix} f_{n, j}^{DT} (\cdot) = \underset{l = 1}{\sum^{L}} c_{l} I (x_{Bootstrap}^{n, j} \in R_{l}), l = 1, 2, \dots, L & (2) \end{matrix}$

where L represents a number of leaf nodes of the decision tree, I(⋅) represents an indicator function, and c_lis computed through recursive splitting;

- expressing a splitting loss function ⋅_i(⋅) of the decision tree in the RF as:

$\begin{matrix} Ω_{i} (s, v) = \min ([y_{L} - E [y_{L}]] + [y_{R} - E [y_{R}]]) = \min (\sum_{x_{Bootstrap}^{n, j} \in R_{L}} {(y_{L}^{i} - c_{L})}^{2} + \sum_{x_{Bootstrap}^{n, j} \in R_{R}} {(y_{R}^{i} - c_{R})}^{2}) & (3) \end{matrix}$

- where Ω_i(s,v) represents a value V of a sth feature value taken as a loss function value of a splitting criterion, y_Lrepresents a truth vector of a DXN emission concentration at a left leaf node, E[y_L] represents a mathematical expectation of y_L, y_Rrepresents a truth vector of a DXN emission concentration at a right leaf node, E[y_R] represents a mathematical expectation of y_R, y_Lⁱrepresents a ith DXN emission concentration truth at the left leaf node, y_Rⁱrepresents a ith DXN emission concentration truth at the right leaf node, c_Lrepresents a predicted output of the DXN emission concentration at the left leaf node, and c_Rrepresents a predicted output of the DXN emission concentration at the right leaf node;

splitting, by minimizing Ω_i(s,v), a training set (X_Bootstrap^n,j,y_Bootstrap^n,j) into two tree nodes as follows:

$\begin{matrix} \min {Ω_{i} (s, v)}_{i = 1}^{N_{Raw} \times M} \overset{Tree node splitting}{⟶} {\begin{matrix} R_{L}^{N_{L} \times M} \\ R_{R}^{N_{R} \times M} \end{matrix} & (4) \end{matrix}$

- where R_L^N^L^×Mand R_R^N^R^×Mrepresent sample sets included in a left tree node and a right tree node after division respectively, N_Land N_Rrepresents a number of samples in R_L^N^L^×Mand R_R^N^R^×Mrespectively;
- expressing predicted output values c_L^RFand c_Rof the DXN emission concentration at a current left tree node and a current right tree node as sample truth expectations as follows:

$\begin{matrix} {\begin{matrix} c_{L}^{RF} = E [y_{L}], & y_{L} \in R_{L}^{N_{L} \times M} \\ c_{R}^{RF} = E [y_{R}], & y_{R} \in R_{R}^{N_{R} \times M} \end{matrix} & (5) \end{matrix}$

- where y_Land y_Rrepresent truth vectors of the DXN emission concentration in R_L^N^L^×Mand R_R^N^R^×M, and E[y_L] and E[y_R] represent mathematical expectations of y_Land y_R;
- different from the RF, expressing a splitting loss function in the CRF in a completely random selection mode as follows:

$\begin{matrix} rand {{(s, v)}_{i}}_{i = 1}^{N_{Raw} \times M} \overset{Tree node splitting}{⟶} {\begin{matrix} R_{L}^{N_{L} \times M} \\ R_{R}^{N_{R} \times M} \end{matrix} & (6) \end{matrix}$

- where rand{(s,v)_i}_i=1^N^Raw^×Mrepresents that a value v of a sth feature is completely randomly selected as a split point;
- expressing predicted output values c_L^CRFand c_R^CRFof the DXN emission concentration at a left tree node and a right tree node that are randomly split as sample truth expectations as follows:

$\begin{matrix} \begin{matrix} c_{L}^{CRF} = E [y_{L}], & y_{L} \in R_{L}^{N_{L} \times M} \\ c_{R}^{CRF} = E [y_{R}], & y_{R} \in R_{R}^{N_{R} \times M} \end{matrix} & (7) \end{matrix}$

- expressing, through the above process, the nth hybrid forest group f_n^FML(⋅) as follows:

$\begin{matrix} f_{n}^{FML} (\cdot) = {f_{n, RF}^{FML} (\cdot), f_{n, CRF}^{FML} (\cdot)} & (8) \end{matrix}$

- where f_n,RF^FML(⋅) represents a nth random forest, and f_n,CRF^FML(⋅) represents a nth completely random forest;
- expressing a nth mapped feature Z_nas follows:

$\begin{matrix} Z_{n} = f_{n}^{FML} (X) = {f_{n, RF}^{FML} (X), f_{n, CRF}^{FML} (X)} = [(c_{1, l}^{n, RF}, c_{1, l}^{n, RF}), \dots, (c_{n_{Raw}, l}^{n, RF}, c_{n_{Raw}, l}^{n, RF}), \dots, (c_{N_{Raw}, l}^{n, RF}, c_{N_{Raw}, l}^{n, RF})] & (9) \end{matrix}$

- where (c_1,l^n,RF,c_1,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a first sample of original input data from six different stages in the MSWI process, (c_n_Raw_,l^n,RF,c_n_Raw_,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a n_Rawth sample of original input data from six different stages in the MSWI process, and (c_N_Raw_,l^n,RF,c_N_Raw_,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a N_Rawth sample of original input data from six different stages in the MSWI process; and
- expressing an output of the feature mapping layer as:

$\begin{matrix} Z^{N} = (Z_{1}, Z_{2}, \dots, Z_{N}) \in R^{N_{R a w} \times 2 N} & (10) \end{matrix}$

- where Z₁₁represents a first mapped feature, Z₂represents a second mapped feature, Z_Nrepresents a Nth mapped feature, and a mapped feature matrix Z^Nincludes N_Rawsamples and a 2N dimensional feature.

Further, S2 of constructing the latent feature extraction layer, extracting a latent feature from a feature space of a fully connected hybrid matrix according to a contribution rate, guaranteeing maximum transmission and minimum redundancy of potential valuable information based on an information measurement criterion, and reducing model complexity and computation consumption specifically includes:

- obtaining and expressing a fully connected hybrid matrix A by combining original input data X from six different stages in the MSWI process with a mapped feature matrix Z^Nas follows:

$\begin{matrix} A = [X ❘ Z^{N}] \in R^{N_{R_{aw}} \times (M + 2 N)} & (11) \end{matrix}$

- where A includes N_Rawsamples and a (M+2N) dimensional feature;
- considering that a dimension of A is much higher than a dimension of the original data, minimizing redundant information in A through principal component analysis (PCA), and computing a correlation matrix R of A as follows:

$\begin{matrix} R = \frac{1}{N_{Raw} - 1} A^{T} A \in R^{(M + 2 N) \times (M + 2 N)} & (12) \end{matrix}$

- obtaining (M+2N) feature values and corresponding feature vectors by performing singular value decomposition on R as follows:

$\begin{matrix} R = U_{(M + 2 N)} \sum_{(M + 2 N)} V_{(M + 2 N)} & (13) \end{matrix}$

- where U_(M+2N)represents a (M+2N) order orthogonal matrix, Σ_(M+2N)represents a (M+2N) order diagonal matrix, and V_(M+2N)represents a (M+2N) order orthogonal matrix;

$\begin{matrix} \sum_{(M + 2 N)} = [\begin{matrix} σ_{1} \\ ⋱ \\ σ_{(M + 2 N)} \end{matrix}] & (14) \end{matrix}$

- where σ₁>σ₂> . . . >σ_(M+2)represents eigenvalue values arranged in a descending order;
- determining a number of final principal components according to a set latent feature contribution threshold n,

$\begin{matrix} η = \sum_{q = 1}^{Q_{P C A}} σ_{q} / \sum_{q = 1}^{(M + 2 N)} σ_{q} & (15) \end{matrix}$

- where a number of latent features is Q_PCA«(M+2N);
- based on Q_PCAdetermined latent features above, obtaining a feature vector matrix V_Q_PCA(that is, a projection matrix of A) corresponding to an eigenvalue value set {σ_q}_q=1^Q^PCA, minimizing redundant information by performing feature projection on A, and recording an obtained latent feature as X^PCA, that is,

$\begin{matrix} X^{PCA} = A V_{I_{PCA}} \in R^{N_{R a w} \times M_{PCA}} & (16) \end{matrix}$

- where V_Q_PCAϵR^(M+2N)×Q^PCArepresents a eigenvector of front Q_PCAlatent features;
- computing a mutual information value I″ between a selected latent feature X^PCAand a truth yϵR^N^Raw^×1as follows:

$\begin{matrix} I^{M I} (X^{PCA}, y) = \sum_{q = 1}^{Q_{PCA}} p (x_{q}^{PCA}, y) \log_{2} \frac{p (x_{q}^{PCA}, y)}{p (x_{q}^{PCA}), p (y)} & (17) \end{matrix}$

- where p(x_q^PCA, y) represents a joint probability distribution of a qth qth latent feature x_q^PCAand a DXN emission concentration truth y, p(x_q^PCA) represents a marginal probability PCA distribution of the qth latent feature x_q^PCA, and p(y) represents a marginal probability distribution of the DXN emission concentration truth y;
- guaranteeing a correlation between the selected latent feature and the truth through an information maximization selection mechanism, and expressing the correlation as:

$\begin{matrix} {I_{q}^{MI}}_{q = 1}^{Q_{PCA}} \underset{\to}{I_{q}^{MI} \geq ζ} {I_{q}^{MI}}_{q = 1}^{Q_{PCA}^{MI}}; & (18) \end{matrix}$

- where {I_q^MI}_q=1^Qrepresents a mutual information value between Q^PCAlatent features x_q^PCAand the truth y, ζ represents an information maximization threshold, and

${I_{q}^{MI}}_{q = 1}^{Q_{PCA}^{MI}}$

represents Q_PCA^MIlatent features having a greatest correlation with information of the DXN emission concentration truth y; and

- obtaining a new data set {X′, y}ϵR^N^Raw^×(Q^PCA^MI⁺¹⁾including Q_PCA^MIlatent features, and setting a dimension M_PCA^MI=Q_PCA^MIafter extraction.

Further, S3 of constructing the feature incremental layer, and further enhancing a feature representation capacity by training the feature incremental layer based on an extracted latent feature specifically includes:

- obtaining a J training subset of a hybrid forest algorithm by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} as follows:

$\begin{matrix} {X_{Bootstrap}^{' k, j}, y_{Bootstrap}^{k, j}}_{j = 1}^{J} = φ_{k}^{FEL} (ϕ_{k}^{FEL} ({X^{'}, y}, P_{Bootstrap})) & (19) \end{matrix}$

- where X′_Bootstrap^k,jand y_Bootstrap^k,jrepresent an input and an output of the jth training subset, X′ and y represent an input and an output of a new training set respectively, ϕ_k^FEL(⋅) represents Bootstrap sampling on a kth hybrid forest group, and φ_k^FEL(⋅) represents RSM sampling on the kth hybrid forest group;
- taking construction of a jth RF in the kth hybrid forest group as an example as follows:

$\begin{matrix} {X_{Bootstrap}^{' k, j}, y_{Bootstrap}^{k, j}} \underset{\to}{Ω_{j} (s, v)} f_{k, j}^{DT - RF} (\cdot) = \sum_{l = 1}^{L} c_{l} I (x_{Bootstrap}^{k, j} \in R_{l}), & (20) \end{matrix}$

$l = 1, 2, \dots, L$

- where f_k,j^DT-RF(⋅) represents a jth decision tree of the RF in the kth hybrid forest group in the feature incremental layer; L represents a number of leaf nodes of the decision tree; and c_lis computed through recursive splitting with specific process formulas of (3)-(5);
- obtaining an RF model in the kth hybrid forest group in the feature incremental layer and expressing the RF model as follows:

$\begin{matrix} f_{k, RF}^{FEL} (\cdot) = {f_{k, j}^{DT - RF} (\cdot)}_{j = 1}^{J} & (21) \end{matrix}$

- similarly taking construction of a jth CRF in the kth hybrid forest group as an example as follows:

$\begin{matrix} {X_{Bootstrap}^{' k, j}, y_{Bootstrap}^{k, j}} \overset{{rand}_{j} (s, v)}{⟶} f_{k, j}^{DT - CRF} (\cdot) = \sum_{l = 1}^{L} c_{l} I (x_{Bootstrap}^{k, j} \in R_{l}), l = 1, 2, \dots, L & (22) \end{matrix}$

- where f_k,j^DT-CRF(⋅) represents a jth decision tree of the CRF in the kth hybrid forest group in the feature incremental layer; and c_lis computed through recursive splitting with specific process formulas of (6)-(7);
- obtaining a CRF model in the kth hybrid forest group in the feature incremental layer and expressing the CRF model as follows:

$\begin{matrix} f_{k, CRF}^{FEL} (\cdot) = {f_{k, j}^{DT - CRF} (\cdot)}_{j = 1}^{J} & (23) \end{matrix}$

- obtaining the kth hybrid forest group f_k^FEL(⋅) through the above process, and expressing a kth enhanced feature as follows:

$\begin{matrix} \begin{matrix} H_{k} = f_{k}^{FEL} (X^{'}) = [f_{k, RF}^{FEL} (X^{'}), f_{k, CRF}^{FEL} (X^{'})] \\ = [(c_{1, l}^{k, RF}, c_{1, l}^{k, CRF}), \dots, (c_{n_{Raw}, l}^{k, RF}, c_{n_{Raw}, l}^{k, RF}), \dots, (c_{N_{Raw}, l}^{k, RF}, c_{N_{Raw}, l}^{k, CRF})] \end{matrix} & (24) \end{matrix}$

- where (c_1,l^k,RF, c_1,l^k,RF) represents enhanced mapping on a first sample in new data through the kth hybrid forest group, (c_n_Raw_,l^k,RF,c_n_Raw_,l^k,RF) represents enhanced mapping on a n_Rawth sample in the new data through the kth hybrid forest group, and (c_N_Raw_,l^k,RF,c_N_Raw_,l^k,RF) represents enhanced mapping on a N_Rawth sample in the new data through the kth hybrid forest group;
- expressing an output H^Kof the feature incremental layer as follows:

$\begin{matrix} H^{K} = [H_{1}, H_{2}, \dots, H_{K}] \in R^{N_{Raw} \times 2 K} & (25) \end{matrix}$

- where H₁represents a first enhanced feature, H₂represents a second enhanced feature, and H_Krepresents a K enhanced feature;
- expressing a BHFR model without considering the incremental learning strategy as follows:

$\begin{matrix} \begin{matrix} Y = G^{K} W^{K} \\ = [Z_{1}, Z_{2}, \dots, Z_{N} ❘ H_{1}, H_{2}, \dots H_{K}] W^{K} \end{matrix} & (26) \end{matrix}$

- where G^Krepresents a combination of outputs from the feature mapping layer and the feature incremental layer, that is G^K=[Z^N|H^K], and includes N_Rawsamples and a (2N+2K) dimensional feature; and W^Krepresents a weight of each of the feature mapping layer and the feature incremental layer relative to the output layer, and is computed as follows:

$\begin{matrix} W^{K} = {{(λ I + {[G^{K}]}^{T} G^{K})}^{- 1} [G^{K}]}^{T} Y & (27) \end{matrix}$

- where I represents a unit matrix, and represents a regularization term coefficient; and accordingly, expressing a pseudo-inverse computation of G^Kas:

$\begin{matrix} \begin{matrix} {[G^{K}]}^{†} = {{(λ I + {[G^{K}]}^{T} G)}^{- 1} [G^{K}]}^{T} \\ = {[Z^{N} ❘ H^{K}]}^{†} \end{matrix} & (28) \end{matrix}$

Further, S4 of constructing the incremental learning layer based on an incremental learning strategy, obtaining a weight matrix with Moore-Penrose pseudo-inverse, and implementing high-precision modeling of the BHFR soft sensor model specifically includes:

- obtaining a training subset of the hybrid forest algorithm by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} in a process as follows:

$\begin{matrix} {X_{Bootstrap}^{' p, j}, y_{Bootstrap}^{p, j}}_{j = 1}^{J} = φ_{p}^{ILL} (ϕ_{p}^{ILL} {{X^{'}, y}, P_{Bootstrap}}) & (29) \end{matrix}$

- where X′_Bootstrap^p,jand y_Bootstrap^p,jrepresent the input and the output of the J training subset of the hybrid forest algorithm, X′ and y represent the input and the output of the new training set respectively, and ϕ_i^ILL(⋅) and φ_p^ILL(⋅) represent Bootstrap sampling and sampling of a pth hybrid forest group in the incremental learning layer;
- construction decision trees f_p,RF^ILL(⋅) and f_p,CRF^ILL(⋅) in the pth hybrid forest group in the same process (not repeated herein) as the feature mapping layer and the feature incremental layer;
- under the condition that one hybrid forest group is added, expressing outputs G^K+1of the feature mapping layer, the feature incremental layer and the incremental learning layer as follows:

$\begin{matrix} \begin{matrix} G^{K + 1} = [G^{K} ❘ f_{1}^{FEL} (X^{'})] \\ = [G^{K} ❘ {f_{1, RF}^{FEL} (X^{'}), f_{1, CRF}^{FEL} (X^{'})}] \\ = [G^{K} ❘ [(c_{1, l}^{1, RF}, c_{1, l}^{1, CRF}), \dots, (c_{N_{Raw}, l}^{1, RF}, c_{N_{Raw}, l}^{1, CRF})]] \end{matrix} & (30) \end{matrix}$

- where G^k=[Zⁿ|H^k] includes N_Rawsamples and a (2N+2K) dimensional feature, and G^K+1includes N_Rawsamples and a (2N+2K+2J) dimensional feature;
- recursively updating a Moore-Penrose inverse matrix of G^K+1as follows:

$\begin{matrix} B^{T} = {\begin{matrix} {[C]}^{†}, & if C \neq 0 \\ {[1 + D^{T} D]}^{- 1} {D^{T} [G^{k}]}^{†}, & if C = 0 \end{matrix} & (31) \end{matrix}$

- where a matrix C and a matrix D are computed as follows:

$\begin{matrix} C = H_{K + 1} - G^{K} D & (32) \end{matrix}$

$\begin{matrix} D = {[G^{K}]}^{†} f_{1}^{ILL} (X^{New}) & (33) \end{matrix}$

- expressing a recursive formula of the Moore-Penrose inverse matrix of G^K+1as follows:

$\begin{matrix} {[G^{K + 1}]}^{†} = [\begin{matrix} {[G^{K}]}^{†} - {DB}^{T} \\ B^{T} \end{matrix}] & (34) \end{matrix}$

- computing an updating matrix W_K+1of a weight of the feature mapping layer, the feature incremental layer, the incremental learning layer relative to the output layer as follows:

$\begin{matrix} W^{K + 1} = [\begin{matrix} W^{K} - {DB}^{T} Y \\ B^{T} Y \end{matrix}] where W^{K} = {{(λ I + {[G^{K}]}^{T} G^{K})}^{- 1} [G^{K}]}^{T} Y; & (35) \end{matrix}$

- implementing rapid incremental learning due to the fact that a pseudo-inverse matrix of the hybrid forest group in the incremental learning layer is merely required to be computed according to a pseudo-inverse updating strategy above;
- implementing adaptive incremental learning according to a convergence degree of a training error;
- determining a number p of hybrid forest groups of incremental learning by defining a convergence threshold of the error as θ_Con, and expressing accordingly an incremental learning training error of the BHFR model as follows:

$\begin{matrix} ℓ = \lim_{p \to \infty} ❘ \frac{1}{N} (\sqrt{{(G^{K + p} W^{K + p} - y)}^{2}} - \sqrt{{(G^{K + p + 1} W^{K + p + 1} - y)}^{2}}) ❘ \leq θ_{Con} S . T . θ_{Con} \geq 0 & (36) \end{matrix}$

- where represent a training error value between a p+₁hybrid forest group and a p hybrid forest group of incremental learning, and √{square root over ((G^K+pW^K+p−y)²)} and √{square root over ((G^K+p+1W^K+p+1−y)²)} represent training errors of a BHFR model including p hybrid forest groups and a BHFR model including p+1 hybrid forest groups; and
- expressing a predicted output Ŷ of the BHFR soft sensor model as follows:

$\begin{matrix} \hat{Y} = G^{K + P} W^{K + P} & (37) \end{matrix}$

According to a specific embodiment of the present disclosure, the present disclosure has the following technical effects: the BHFR-based soft sensor method for DXN emission in an MSWI process according to the present disclosure constructs a soft sensor model based on the BHFR, and the soft sensor model combines algorithms such as width learning modeling, integrated learning and latent feature extraction: 1) based on a broad learning system (BLS) framework, a non-differential learner is used to construct the soft sensor model including a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer; 2) internal information of the BHFR model is processed through full information connection, latent feature extraction and mutual information measurement, so as to effectively ensure maximum transmission and minimum redundancy of internal feature information of the BHFR model; 3) a hybrid forest group is used as a mapping unit for incremental learning in a modeling process, and a weight matrix of an output layer is rapidly computed according to a pseudo-inverse strategy, and then the incremental learning is adaptively adjusted based on a convergence degree of a training error, thereby implementing high-precision soft sensor modeling. The validity and rationality of the method are verified by using an industrial process dioxin (DXN) data set.

BRIEF DESCRIPTION OF THE DRAWINGS

To describe the technical solutions in embodiments of the present disclosure or in the prior art more clearly, the accompanying drawings required in the embodiments are briefly described below. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and other drawings can be derived from these accompanying drawings by those of ordinary skill in the art without creative efforts.

FIG. 1 is a flowchart of a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process according to an embodiment of the present disclosure;

FIG. 2 is a technical flowchart of an MSWI process according to an embodiment of the present disclosure;

FIG. 3 is a training error convergence curve according to an embodiment of the present disclosure;

FIG. 4A is a fitting curve of a training set in a DXN data set according to an embodiment of the present disclosure;

FIG. 4B is a fitting curve of a verification set in a DXN data set according to an embodiment of the present disclosure; and

FIG. 4C is a fitting curve of a testing set in a DXN data set according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The technical solutions of embodiments of the present disclosure will be clearly and completely described below with reference to accompanying drawings. Apparently, the described embodiments are merely some embodiments rather than all embodiments of the present disclosure. All other embodiments derived by a person of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.

An objective of the present disclosure is to provide a broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in a municipal solid waste incineration (MSWI) process. A soft sensor modeling algorithm based on broad hybrid forest regression (BHFR) is provided with measurement of a dioxin (DXN) emission concentration in the MSWI process as the target.

To make the above objectives, features, and advantages of the present disclosure clearer and more comprehensible, the present disclosure will be further described below in detail with reference to the accompanying drawings and specific implementation modes.

As shown in FIG. 1, the broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in an MSWI process according to the present disclosure includes:

- based on a broad learning system (BLS) framework, a BHFR soft sensor model is constructed for small sample high-dimensional data by replacing a neuron with a non-differential base learner, where the BHFR soft sensor model includes a feature mapping layer, a latent feature extraction layer, a feature incremental layer and an incremental learning layer, and the method specifically includes:
- S1, the feature mapping layer is constructed, and a high-dimensional feature is mapped by constructing a hybrid forest group composed of a random forest (RF) and a completely random forest (CRF).

S2, the latent feature extraction layer is constructed, a latent feature is extracted from a feature space of a fully connected hybrid matrix according to a contribution rate, maximum transmission and minimum redundancy of potential valuable information are guaranteed based on an information measurement criterion, and model complexity and computation consumption are reduced.

S3, the feature incremental layer is constructed, and a feature representation capacity is further enhanced by training the feature incremental layer based on an extracted latent feature.

S4, the incremental learning layer is constructed based on an incremental learning strategy, a weight matrix is obtained with a Moore-Penrose pseudo-inverse, and high-precision modeling of the BHFR soft sensor model is implemented.

S5, the soft sensor model is verified by using an industrial process dioxin (DXN) data set.

S6, a soft sensor is used on DXN emission in the MSWI process with the soft sensor model constructed in steps S1-S5.

The MSWI process includes technical stages such as solid waste storage and transportation, solid waste incineration, waste heat boilers, steam power generation, flue gas purification and flue gas emission. A grate MSWI process with a daily capacity of 800 tons is taken as an example, and a technical flow is shown in FIG. 2.

An entire process of DXN decomposition, generation, adsorption and emission is combined to describe main functions at each stage as follows:

1) At the solid waste storage and transportation stage: sanitation vehicles transport MSW from various collection stations in a city to an MSWI power plant, and after being weighed and recorded, the MSW is dumped from an unloading platform to an unfermented area in a solid waste storage pool, then is mixed and stirred with a solid waste grab, then is grabbed to a fermentation area, and is fermented and dehydrated for 3-7 days to guarantee a low calorific value of MSW incineration. The research shows that primary MSW contains a small amount of DXN (about 0.8 ng TEQ/Kg) and a variety of chlorine-containing compounds needed for formation of DXN.

2) At the solid waste incineration stage: fermented MSW is thrown by the solid waste grab into a feed hopper, and is pushed into an incinerator through a feeder. After drying, combustion 1, combustion 2 and a burning-through grate in turn, combustible components in the MSW are completely burned, required combustion-supporting air is injected from a lower portion of the grate and a middle portion of a furnace by a primary fan and a secondary fan, and ash produced by final combustion falls from the end of a burning-through grate to a submerged chain conveyor, and is sent to a slag pool after water cooling. In order to ensure that the DXN contained in the original MSW and generated during incineration may be completely decomposed under the high-temperature combustion conditions in the furnace, during furnace combustion, a flue gas temperature is required to be controlled strictly above 850° C., and residence time of high-temperature flue gas in the furnace for more than 2 seconds, so as to guarantee sufficient flue gas turbulence.

3) At the waste heat boiler stage: the high-temperature flue gas (higher than 850° C.) generated in the furnace is pumped into a waste heat boiler system by an induced draft fan, and then passes through a superheater, an evaporator and an economizer. After heat exchange between the high-temperature flue gas and liquid water in a boiler drum, high-temperature steam is generated, such that the high-temperature flue gas is cooled, and a temperature of flue gas (that is, flue gas G1) at an outlet of the waste heat boiler is lower than 200° C. From the perspective of a formation mechanism of DXN, chemical reactions that lead to the formation of DXN when the high-temperature flue gas is cooled by the waste heat boiler include high-temperature gas phase synthesis (800° C.-500° C.), precursor synthesis (450° C.-200° C.) and de novo synthesis (350° C.-250° C.), but there is no unified conclusions yet.

4) At the steam power generation stage: the high-temperature steam generated by the waste heat boiler is used to drive a turbine generator, and mechanical energy is converted into electrical energy, so as to implement self-sufficiency of power consumption at a plant level and on-grid power supply of surplus power, resource recycling and obtain economic benefits.

5) At the flue gas purification stage: the flue gas purification in the MSWI process mainly includes a series of processes, such as denitration (NOX), desulfurization (HCL, HF, SO2, etc.), heavy metal removal (Pb, Hg, Cd, etc.), dioxin adsorption (DXN) and dust removal (particulate matter), so as to satisfy emission standards of incineration flue gas pollutants. It is the most widely used technical means to adsorb DXN in incineration flue gas with an activated carbon injection system, and adsorbed DXN is enriched in fly ash.

6) At the flue gas emission stage: incineration flue gas including trace DXN after cooling and purification (that is, flue gas G2) is sucked by the induced draft fan and emitted into the atmosphere through a chimney. The uninterrupted and long-term running of the MSWI process lead to a large quantity of DXN (that is, memory effects) attached to particles on an inner wall of the chimney. Operation condition for emission is still a research problem at present.

At present, research on the soft sensor of D×N in the MSWI process mainly focuses on measurement of the DXN concentration at an emission stage (that is, flue gas G3), and a research focus of the present application is to construct a soft sensor model at G3 flue gas.

The BHFR modeling strategy according to the present application includes the feature mapping layer, the latent feature extraction layer, the feature incremental layer and the incremental learning layer.

In FIG. 1, {X,y}ϵR^N^Raw^×(M+1)represents original data, where XϵR^N^Raw^×Mrepresents original input data, N_Rawrepresents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process above, and is collected and stored in a distributed control system (DCS) in seconds, and yϵR^N^Raw^×1represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; {DT₁, . . . , DT_J} represents J decision tree models in a hybrid forest algorithm, DT₁represents the first decision tree model and DT_Jrepresents the Jth decision tree model; Bootstrap and RSM represent sampling and feature sampling on input data; {RF_n,CRF_n} represents the nth hybrid forest group model, and RF_nand CRF_nrepresent the nth RF and CRF models; {Group_n}_n=1^Nrepresents that the feature mapping layer includes N hybrid forest group models; Z^Nrepresents an output from the feature mapping layer; H^Krepresents an output from the feature incremental layer; [X|Z^N] represents a fully connected hybrid matrix consisted by the original data and Z^N; X′ϵR^N^Raw^×M^PCArepresents a new training data after latent feature extraction; {Group_k}_k=1^Prepresents K hybrid forest group models included in the feature incremental layer; {Group_p}_p=1^Prepresents P hybrid forest group model included in the incremental learning layer; and W^K+Prepresents a final weight matrix.

Each part has main functions as follows:

1) The feature mapping layer: the original input data XϵR^N^Raw^×Mfrom six different stages in the MSWI process are subjected to feature mapping through N′ hybrid forest groups in the feature mapping layer {RF_n,CRF_n}_n=1^Nto obtain a mapping output matrix Z^N.

2) The latent feature extraction layer: a latent feature is extracted, through principal component analysis, from the fully connected hybrid matrix [X|Z] composed of the original input data XϵR^N^Raw^×Mand the output Z^Nfrom the feature mapping layer, redundant information is removed from the feature space, and a latent feature dimension is further determined and a new training set X′ϵR^N^Raw^×M^PMis obtained through mutual information between an extracted latent feature and an output truth y of the DXN emission concentration.

3) The feature incremental layer: the new training set X′ϵR^N^Raw^×M_PMis taken as an input, feature mapping is performed through K′ hybrid forest groups {RF_k,CRF_k}_k=1^Kof the feature incremental layer, and an output matrix H^Kof the enhancement layer is obtained.

4) The incremental learning layer: the new training set X′ϵR^N^Raw^×M^PMis taken as an input, and a weight W^K+Pis gradually increased and updated with the hybrid forest group as the minimum unit until a training error converges.

Essentially, the BHFR replaces a neuron in the original BLS by taking a hybrid forest group composed of the RF and the CRF as the basic mapping unit. The S1, the feature mapping layer is constructed, and a high-dimensional feature is mapped by constructing a hybrid forest group composed of an RF and a CRF includes:

- original data are set as {X,y}, where XϵR^N^Raw^×Mrepresents original input data, N_Rawrepresents a number of the original data, M represents a dimension of the original input data, is sourced from six different stages in the MSWI process, and is collected and stored in a distributed control system (DCS) in seconds, and JϵR^N^Raw^×1represents an output truth of a DXN emission concentration, and is sourced from an emission DXN measurement sample obtained through an off-line measurement method; a modeling process of the feature mapping layer is described by taking the nth hybrid forest group of feature mapping layer as an example:
- J training subsets of a hybrid forest group model are obtained by performing Bootstrap and random subspace (RSM) sampling on {X,y} as follows:

$\begin{matrix} {X_{Bootstrap}^{n, j}, y_{Bootstrap}^{m, j}}_{j = 1}^{J} = φ_{n}^{FML} (ϕ_{n}^{FML} ((X, y), P_{Bootstrap})) & (1) \end{matrix}$

where X_Bootstrap^n,jand y_Bootstrap^n,jrepresent an input and an output of the jth training subset, ϕ_n^FML(⋅) and Pφ_n^FML(⋅) represent Bootstrap sampling and RSM sampling of the nth hybrid forest group in the feature mapping layer, and P_Bootstraprepresents a Bootstrap sampling probability;

- a hybrid forest algorithm including J decision trees is trained based on {X_Bootstrap^n,j,y_Bootstrap^n,j}_j=1^J, where the jth decision tree of the nth hybrid forest group in the feature mapping layer is expressed as follows:

$\begin{matrix} f_{n, j}^{DT} (\cdot) = \sum_{l = 1}^{L} c_{l} I (x_{Bootstrap}^{n, j} \in R_{l}), l = 1, 2, \dots, L & (2) \end{matrix}$

- where L represents a number of leaf nodes of the decision tree, I(⋅) represents an indicator function, and c_lis computed through recursive splitting;
- a splitting loss function Ω_i(⋅) of the decision tree in the RF is expressed as:

- where Ω_i(s,v) represents a value V of the sth feature value taken as a loss function value of a splitting criterion, y_Lrepresents a truth vector of a DXN emission concentration at a left leaf node, E[y_L] represents a mathematical expectation of y_L, y_Rrepresents a truth vector of a DXN emission concentration at a right leaf node, E[y_R] represents a mathematical expectation of y_R, Y_Lⁱrepresents the ith DXN emission concentration truth at the left leaf node, y_Rⁱrepresents the ith DXN emission concentration truth at the right leaf node, c_Lrepresents a predicted output of the DXN emission concentration at the left leaf node, and c_Rrepresents a predicted output of the DXN emission concentration at the right leaf node;
- by minimizing Ω_i(s,v), a training set (X_Bootstrap^n,j,y_Bootstrap^m,j) is split into two tree nodes as follows:

$\begin{matrix} \min {Ω_{i} (s, v)}_{i = 1}^{N_{Raw} \times M} \overset{Tree node splitting}{\to} {\begin{matrix} R_{L}^{N_{L} \times M} \\ R_{R}^{N_{R} \times M} \end{matrix} & (4) \end{matrix}$

- where R_L^N^L^×Mand R_R^N^R^×Mrepresent sample sets included in a left tree node and a right tree node after division respectively, N_Land N_Rrepresents a number of samples in R_L^N^L^×Mand R_R^N^R^×Mrespectively;
- predicted output values c_L^RFand c_R^RFof the DXN emission concentration at a current left tree node and a current right tree node are expressed as sample truth expectations as follows:

$\begin{matrix} {\begin{matrix} c_{L}^{R F} = E [y_{L}], y_{L} \in R_{L}^{N_{L} \times M} \\ c_{R}^{R F} = E [y_{R}], y_{R} \in R_{R}^{N_{R} \times M} \end{matrix} & (5) \end{matrix}$

- where y_Land y_Rrepresent truth vectors of the DXN emission concentration in R_L^N^L^×Mand R_R^N^R^×M, and E[y_L] and E[y_R] represent mathematical expectations of y_Land y_R;
- different from the RF, a decision tree in the CRF is split in a completely random selection mode expressed as follows:

$\begin{matrix} rand {{(s, v)}_{i}}_{i = 1}^{N_{Raw} \times M} \overset{Tree node splitting}{\to} {\begin{matrix} R_{L}^{N_{L} \times M} \\ R_{R}^{N_{R} \times M} \end{matrix} & (6) \end{matrix}$

- where rand{(s,v)_i}_i=1^N^Raw^×Mrepresents that a value v of the sth feature is completely randomly selected as a split point;
- predicted output values c_L^CRFand c_R^CRFof the DXN emission concentration at a left tree node and a right tree node that are randomly split as sample truth expectations are expressed as follows:

$\begin{matrix} c_{L}^{C R F} = E [y_{L}], y_{L} \in R_{L}^{N_{L} \times M} & (7) \end{matrix}$

$c_{R}^{C R F} = E [y_{R}], y_{R} \in R_{R}^{N_{R} \times M}$

- through the above process, the nth hybrid forest group f_n^FML(⋅) is expressed as follows:

$\begin{matrix} f_{n}^{FML} (\cdot) = {f_{n, RF}^{FML} (\cdot), f_{n, CRF}^{FML} (\cdot)} & (8) \end{matrix}$

- where f_n,RF^FML(⋅) represents the nth random forest, and f_n,CRF^FMLrepresents the nth completely random forest;
- the nth mapped feature Z_nis expressed as

- where (c_1,l^n,RF, c_1,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of a first sample of original input data from six different stages in the MSWI process, (c_n_Raw_,l^n,RF,c_n_Raw_,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of the N_Rawth sample of original input data from six different stages in the MSWI process, and (c_N_Raw_,l^n,RF,c_N_Raw_,l^n,RF) represents a mapped feature, obtained through the nth hybrid forest group, of the N_Rawth sample of original input data from six different stages in the MSWI process; and
- an output of the feature mapping layer is expressed as:

$\begin{matrix} Z^{N} = (Z_{1}, Z_{2}, \dots, Z_{N}) \in R^{N_{R a w} \times 2 N} & (10) \end{matrix}$

- where Z₁represents the first mapped feature, Z₂represents the second mapped feature, Z_Nrepresents the Nth mapped feature, and a mapped feature matrix Z^Nincludes N_Rawsamples and a 2N dimensional feature.

In order to avoid an over-fitting phenomenon caused by an information loss during information transmission, the BHFR according to the present application uses a full connection strategy to implement information transmission between the feature mapping layer, the feature incremental layer and the incremental learning layer. Moreover, in order to minimize information redundancy during model training, the PCA is used to extract the latent feature from a feature space of the fully connected hybrid matrix, and then mutual information is used to further screen a latent feature related to maximization of truth information, so as to reduce a dimension of high-dimensional data.

S2 that the latent feature extraction layer is constructed, a latent feature is extracted from a feature space of a fully connected hybrid matrix according to a contribution rate, maximum transmission and minimum redundancy of potential valuable information are guaranteed based on an information measurement criterion, and model complexity and computation consumption are reduced specifically includes:

- a fully connected hybrid matrix A is obtained and expressed by combining original input data X from six different stages in the MSWI process with a mapped feature matrix Z as follows:

$\begin{matrix} A = [X | Z^{N}] \in R^{N_{Raw} \times (M + 2 N)} & (11) \end{matrix}$

- where A includes N_Rawsamples and a (M+2N) dimensional feature;
- that a dimension of A is much higher than a dimension of the original data is considered, redundant information in A is minimized through principal component analysis (PCA), and a correlation matrix R of A is computed as follows:

$\begin{matrix} R = \frac{1}{N_{R a w} - 1} A^{T} A \in R^{(M + 2 N) \times (M + 2 N)} & (12) \end{matrix}$

- (M+2N) feature values and corresponding feature vectors are obtained by performing singular value decomposition on R as follows:

$\begin{matrix} R = U_{(M + 2 N)} Σ_{(M + 2 N)} V_{(M + 2 N)} & (13) \end{matrix}$

- where U_(M+2N)represents a (M+2N) order orthogonal matrix, Σ_(M+2N)represents a (M+2N) order diagonal matrix, and V_(M+2N)represents a (M+2N) order orthogonal matrix;

$\begin{matrix} Σ_{(M + 2 N)} = [\begin{matrix} σ_{1} \\ ⋱ \\ σ_{(M + 2 N)} \end{matrix}] & (14) \end{matrix}$

- where σ₁>σ₂> . . . >σ_(M+2N)represents eigenvalues arranged in a descending order;
- a number of final principal components is determined according to a set latent feature contribution threshold η,

$\begin{matrix} η = \sum_{q = 1}^{Q_{P C A}} σ_{q} / \sum_{q = 1}^{(M + 2 N)} σ_{q} & (15) \end{matrix}$

- where a number of latent features is Q_PCA«(M+2N);
- based on Q_PCAdetermined latent features above, a feature vector matrix V_Q_PCA(that is, a projection matrix of A) corresponding to a eigenvalue set {σ_q}_q=1^Q^PCAis obtained, redundant information is minimized by performing feature projection on A, and an obtained latent feature is recorded as X^PCA, that is,

$\begin{matrix} X^{P C A} = A V_{I_{PCA}} \in R^{N_{R a w} \times M_{P C A}} & (16) \end{matrix}$

where V_Q_PCAϵR^(M+2N)×Q^PCArepresents a eigenvector of front Q_PCAlatent features;

- a mutual information value I^MIbetween a selected latent feature X^PCAand a truth yϵR^N^Raw^×1is computed as follows:

$\begin{matrix} I^{M I} (X^{P C A}, y) = \sum_{q = 1}^{Q_{P C A}} p (x_{q}^{P C A}, y) \log_{2} \frac{p (x_{q}^{P C A}, y)}{p (x_{q}^{P C A}), p (y)} & (17) \end{matrix}$

- where p(x_q^PCA,y) represents a joint probability distribution of the qth latent feature x_q^PCAand a DXN emission concentration truth y, p(x_q^PCA) represents a marginal probability distribution of the qth latent feature x_q^PCA, and p(y) represents a marginal probability distribution of the DXN emission concentration truth y;
- a correlation between the selected latent feature and the truth is guaranteed through an information maximization selection mechanism, and the correlation is expressed as:

$\begin{matrix} {I_{q}^{MI}}_{q = 1}^{Q_{PCA}} \overset{I_{q}^{M I} \geq ζ}{\to} {I_{q}^{MI}}_{q = 1}^{Q_{PCA}^{MI}} & (18) \end{matrix}$

- where {I_q^MI}_q=1^Qrepresents a mutual information value between Q_PCAlatent features x_q^PCAand the truth y, ζ represents an information maximization threshold, and

${I_{q}^{M I}}_{q = 1}^{Q_{P C A}^{MI}}$

represents Q_PCA^MIlatent features having a greatest correlation with information of the DXN emission concentration truth; and

- a new data set {X′,y}ϵR^N^Raw^×(Q^PCA_MI⁺¹⁾including Q_PCA^MIlatent features is obtained, and a dimension M_PCA^MI=Q_PCA^MIafter extraction is set.

S3 that the feature incremental layer is constructed, and a feature representation capacity is further enhanced by training the feature incremental layer based on an extracted latent feature specifically includes:

- a J training subset of a hybrid forest algorithm is obtained by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} as follows:

$\begin{matrix} {X_{Boostrap}^{' k, j}, y_{Bootstrap}^{k, j}}_{j = 1}^{J} = φ_{k}^{FEL} (ϕ_{k}^{FEL} ({X^{'}, y}, P_{B o o tstrap})) & (19) \end{matrix}$

- where X′_Bootstrap^k,jand y_Bootstrat^k,jrepresent an input and an output of the J training subset, X′ and y represent an input and an output of a new training set respectively, ϕ_k^FEL(⋅) represents Bootstrap sampling on the kth hybrid forest group, and φ_k^FEL(⋅) represents RSM sampling on the kth hybrid forest group;
- construction of the jth RF in the kth hybrid forest group is taken as an example as follows:

$\begin{matrix} {X_{Bootstrap}^{' k, j}, y_{B ootstrap}^{k, j}} \overset{Ω_{j} (s, v)}{\to} f_{k, j}^{D T - P F} (\cdot) = \sum_{l = 1}^{L} c_{l} I (x_{B ootstrap}^{k, j} \in R_{l}), l = 1, 2, \dots, L & (20) \end{matrix}$

- where f_k,j^DT-RF(⋅) represents a jth decision tree of the RF in the kth hybrid forest group in the feature incremental layer; L represents a number of leaf nodes of the decision tree; and c_lis computed through recursive splitting with specific process formulas of (3)-(5);
- an RF model in the kth hybrid forest group in the feature incremental layer is obtained and the RF model is expressed as follows:

$\begin{matrix} f_{k, RF}^{F E L} (\cdot) = {f_{k, j}^{DT - RF} (\cdot)}_{j = 1}^{J} & (21) \end{matrix}$

- construction of the jth CRF in the kth hybrid forest group is taken as an example similarly as follows:

$\begin{matrix} {X_{Bootstrap}^{' k, j}, y_{B ootstrap}^{k, j}} \overset{ran d_{j} (s, v)}{\to} f_{k, j}^{DT - CRF} (\cdot) = \sum_{l = 1}^{L} c_{l} I (x_{B ootstrap}^{k, j} \in R_{l}), l = 1, 2, \dots, L & (22) \end{matrix}$

- where f_k,j^DT-CRF(⋅) represents the jth decision tree of the CRF in the kth hybrid forest group in the feature incremental layer; and c_lis computed through recursive splitting with specific process formulas of (6)-(7);
- a CRF model in the kth hybrid forest group in the feature incremental layer is obtained and the CRF model is expressed as follows:

$\begin{matrix} f_{k, CRF}^{F E L} (\cdot) = {f_{k, j}^{D T - C R F} (\cdot)}_{j = 1}^{J} & (23) \end{matrix}$

- the kth hybrid forest group f_k^FEL(⋅) through the above process is obtained, and the kth enhanced feature is expressed as follows:

$\begin{matrix} \begin{matrix} H_{k} = f_{k}^{FEL} (X^{'}) = [f_{k, RF}^{F E L} (X^{'}), f_{k, CRF}^{F E L} (X^{'})] \\ = [(c_{1, l}^{k, RF}, c_{1, l}^{k, CRF}), \dots, (c_{n_{R a w}, l}^{k, RF}, c_{n_{R a w}, l}^{k, RF}), \dots, (c_{N_{R a w}, l}^{k, RF}, c_{N_{R a w}, l}^{k, CPF})] \end{matrix} & (24) \end{matrix}$

where (c_1,l^k,RF, c_1,l^k,RF) represents enhanced mapping on a first sample in new data through the kth hybrid forest group, (c_n_Raw_,l^k,RF,c_n_Raw_,l^k,RF) represents enhanced mapping on the n_Rawth sample in the new data through the kth hybrid forest group, and (c_N_Raw_,l^k,RF,c_N_Raw_,l^k,RF) represents enhanced mapping on a N_Rawth sample in the new data through the kth hybrid forest group;

- an output H^Kof the feature incremental layer is expressed as follows:

$\begin{matrix} H^{K} = [H_{1}, H_{2}, \dots, H_{K}] \in R^{N_{Raw} \times 2 K} & (25) \end{matrix}$

- where H₁represents a first enhanced feature, H₂represents a second enhanced feature, and H_Krepresents the Kth enhanced feature;
- a BHFR model is expressed without considering the incremental learning strategy as follows:

$\begin{matrix} Y = G^{K} W^{K} = [Z_{1}, Z_{2}, \dots, Z_{N} | H_{1}, H_{2}, \dots, H_{K}] W^{K} & (26) \end{matrix}$

- where G^Krepresents a combination of outputs from the feature mapping layer and the feature incremental layer, that is G^K=[Z^N|H^K], and includes N_Rawsamples and a (2N+2K) dimensional feature; and W^Krepresents a weight of each of the feature mapping layer and the feature incremental layer relative to the output layer, and is computed as follows:

$\begin{matrix} W^{K} = {{(λ I + {[G^{K}]}^{T} G^{K})}^{- 1} [G^{K}]}^{T} Y & (27) \end{matrix}$

- where I represents a unit matrix, and λ represents a regularization term coefficient; and a pseudo-inverse computation of G is expressed accordingly as:

$\begin{matrix} {[G^{K}]}^{†} = {{(λ I + {[G^{K}]}^{T} G)}^{- 1} [G^{K}]}^{T} = {[Z^{N} | H^{K}]}^{†} & (28) \end{matrix}$

The BHFR according to the present application takes the hybrid forest group as a basic unit to implement incremental learning according to the convergence degree of a training error. S4 that the incremental learning layer is constructed based on an incremental learning strategy, a weight matrix is obtained with a Moore-Penrose pseudo-inverse, and high-precision modeling of the BHFR soft sensor model is implemented specifically includes:

- a training subset of the hybrid forest algorithm is obtained by performing Bootstrap sampling and RSM sampling on the new data set {X′,y} in a process as follows:

$\begin{matrix} {X_{Bootstrap}^{' p, j}, y_{B ootstrap}^{p, j}}_{j = 1}^{J} = φ_{p}^{ILL} {ϕ_{p}^{ILL} {{X^{'}, y}, P_{B o o t s t r a p}}) & (29) \end{matrix}$

- where X′_Bootstrap^p,jand y_Bootstrap^p,jrepresent the input and the output of the jth training subset of the hybrid forest algorithm, X′ and y represent the input and the output of the new training set respectively, and ϕ_i^ILL(⋅) and φ_p^ILL(⋅) represent Bootstrap sampling and sampling of a pth hybrid forest group in the incremental learning layer;
- decision trees f_p,RF^ILL(⋅) and f_p,CRF^ILL(⋅) in the pth hybrid forest group are constructed in the same process (not repeated herein) as the feature mapping layer and the feature incremental layer;
- under the condition that one hybrid forest group is added, outputs G^K+1of the feature mapping layer, the feature incremental layer and the incremental learning layer are expressed as follows:

$\begin{matrix} G^{K + 1} = [G^{K} | f_{1}^{F E L} (X^{'})] = [⁠ G^{K} | {f_{1, FF}^{F E L} (X^{'}), f_{1, CRF}^{F E L} (X^{'})}] = [⁠ G^{K} | [(c_{1, l}^{1, RF}, c_{1, l}^{1, CRF}), \dots, (c_{N_{Raw}, l}^{1, CRF}, c_{N_{Raw}, l}^{1, CPF})]] & (30) \end{matrix}$

- where G^k=[Zⁿ|H^k] includes N_Rawsamples and a (2N+2K) dimensional feature, and G^K+1includes N_Rawsamples and a (2N+2K+2J) dimensional feature;
- a Moore-Penrose inverse matrix of G^K+1is updated recursively as follows:

$\begin{matrix} B^{T} = {\begin{matrix} {[C]}^{†}, & if C \neq 0 \\ {[1 + D^{T} D]}^{- 1} {D^{T} [G^{k}]}^{†}, & if C = 0 \end{matrix} & (31) \end{matrix}$

- where a matrix C and a matrix D are computed as follows:

$\begin{matrix} C = H_{K + 1} - G^{K} D & (32) \end{matrix}$

$\begin{matrix} D = {[G^{K}]}^{†} f_{1}^{ILL} (X^{New}) & (33) \end{matrix}$

- a recursive formula of the Moore-Penrose inverse matrix of G^K+1is expressed as follows:

$\begin{matrix} {[G^{K + 1}]}^{†} = [\begin{matrix} {[G^{K}]}^{†} - {DB}^{T} \\ B^{T} \end{matrix}] & (34) \end{matrix}$

- an updating matrix W^K+1of a weight of the feature mapping layer, the feature incremental layer, the incremental learning layer relative to the output layer is computed as follows:

$\begin{matrix} W^{K + 1} = [\begin{matrix} W^{K} - {DB}^{T} Y \\ B^{T} Y \end{matrix}] where W^{K} = {{(λ I + {[G^{K}]}^{T} G^{K})}^{- 1} [G^{K}]}^{T} Y; & (35) \end{matrix}$

- rapid incremental learning is implemented due to the fact that a pseudo-inverse matrix of the hybrid forest group in the incremental learning layer is merely required to be computed according to a pseudo-inverse updating strategy above;
- adaptive incremental learning is implemented according to a convergence degree of a training error;

a number P of hybrid forest groups of incremental learning is determined by defining a convergence threshold of the error as θ_Con, and an incremental learning training error of the BHFR model is expressed accordingly as follows:

- where represent a training error value between a p+1 hybrid forest group and a p hybrid forest group of incremental learning, and √{square root over ((G^K+pW^K+p−y)²)} and √{square root over ((G^K+pW^K+p+1−y)²)} represent training errors of a BHFR model including p hybrid forest groups and a BHFR model including p+1 hybrid forest groups; and

a predicted output Ŷ of the BHFR soft sensor model is expressed as follows:

$\begin{matrix} \hat{Y} = G^{K + P} W^{K + P} & (37) \end{matrix}$

The present application uses actual DXN data of a MSWI power plant for industrial verification. The DXN data are sourced from a MSWI incineration power plant in Beijing, and include 141 sets of DXN emission concentration modeling data from 2009 to 2020. A DXN truth is a converted concentration after 2-hour sampling and testing, and an input variable with missed data and abnormal variables removed is 116 dimensions, and a corresponding value is a mean in a current sampling period of DXN truth.

In the present application, a root mean square error (RMSE), a mean absolute error (MAE) and a coefficient of determination (R2) are selected to compare the performance of different methods with computation as follows:

$\begin{matrix} R M S E = \sqrt{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2} / (N - 1)} & (38) \end{matrix}$

$\begin{matrix} MAE = \frac{1}{N} \sum_{i = 1}^{N} ❘ {\hat{y}}_{i} - y_{i} ❘ & (39) \end{matrix}$

$\begin{matrix} R^{2} = 1 - \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2} / \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2} & (40) \end{matrix}$

- where N represents a number of data, y_irepresents i truth, ŷ_irepresents i predicted value and y represents a mean value.

In the DXN data set, parameters of the BHFR method are set as follows: a minimum sample number N_samplesof leaf nodes of a decision tree is 7, a number of RSM feature selection is √{square root over (N_features)} a number N_treeof decision trees is 10, a number N_Forestof hybrid forest groups in each of the feature mapping layer and the feature incremental layer is 10, a latent feature contribution rate threshold η is 0.9, and a regularization parameter λ is 2-10.

A number of latent features for the feature incremental layer and the incremental learning layer is determined based on the fully connected hybrid matrix and a feature space A. A feature dimension of A in a DXN data set is 316. When the latent feature contribution rate threshold η is 0.9, a number of latent features selected from the DXN data set is 35 respectively. Then, mutual information values between 35 latent features and the DXN truth are computed. A mutual information threshold π is set as 0.75, and a number of latent features selected from the DXN data set is 6.

Further, a number of hybrid forest group units in the incremental learning layer is preset to be 1000, and accordingly a relation between the training error of the BHFR model and a number of hybrid forest groups is shown in FIG. 3.

According to a training error curve shown in FIG. 3, a training process of the BHFR on the DXN data set may converge to a certain lower limit.

Then, RF, DFR, DFR-clfc and BLS-NN are used to compare with the provided BHFR, and parameters are set as follows: (1) in the case of the RF, a minimum sample number N_smplesof leaf nodes of a decision tree is 3, a number of RSM feature selection is √{square root over (N_features)}, and a number N_treeof decision trees is 500; (2) in the case of DFR, a minimum sample number N_smplesof leaf nodes of a decision mode is 3, a number of RSM feature selection is √{square root over (N_features)}, a number N_treeof decision trees is 500, a number NRF of RF models and a number NCRF of CRF models in each layer are 2, and a total number of layers is set as 50; (3) in the case of DFR-clfc, a minimum sample number N_smplesof leaf nodes of a decision tree is 3, a number of RSM feature selection is √{square root over (N_features)}, a number N_treeof decision trees is 500, a number N_RFof RF models and a number N_CRFof CRF models in each layer are 2, and a total number of layers is set as 50; and (4) in the case of BLS-NN, a number N_mof feature nodes is 5, a number N_eof enhanced nodes is 41, a number N_nof neurons is 9 and the regularization parameter λ is 2{circumflex over ( )}30. The above method is repeated for 20 times under the same conditions, and statistical results and prediction curves are shown in Table 1 and FIGS. 4A-4C.

TABLE 1

Experiment results of the DXN data set

RMSE
MAE
R2

Method
Data set
Mean
Variance
Mean
Variance
Mean
Variance

RF
Training set
1.1159E−02
5.7497E−08
9.0221E−03
4.0684E−08
8.5346E−01
3.9360E−05

Verification set
2.0051E−02
1.8026E−07
1.4677E−02
8.2255E−08
5.0196E−01
4.3515E−04

Testing set
1.6922E−02
1.6150E−07
1.3548E−02
8.9520E−08
5.9001E−01
3.7817E−04

DFR
Training set
1.1493E−02
8.7413E−09
9.4568E−03
4.6626E−09
8.4463E−01
6.3663E−06

Verification set
2.0735E−02
9.7835E−09
1.5780E−02
1.1121E−08
4.6759E−01
2.5813E−05

Testing set
1.7791E−02
1.7308E−08
1.4608E−02
1.5235E−08
5.4701E−01
4.5066E−05

DFR-clfc
Training set
8.0852E−03
2.9078E−06
6.6040E−03
2.0819E−06
9.1986E−01
1.1887E−03

Verification set
2.0187E−02
1.4562E−07
1.5626E−02
2.3355E−08
4.9520E−01
3.6404E−04

Testing set
1.7025E−02
1.5755E−07
1.4068E−02
6.0233E−08
5.8501E−01
3.7843E−04

BLS-NN
Training set
1.2924E−09
1.5756E−18
9.5358E−10
7.2150E−19
1.0000E+00
8.2358E−29

Verification set
6.8845E−02
7.0040E−04
5.3153E−02
3.3474E−04
−5.6928E+00
3.7799E+01

Testing set
7.8396E−02
6.7692E−04
6.0922E−02
4.1785E−04
−8.7153E+00
4.7630E+01

BHFR
Training set
6.0665E−03
1.6330E−08
3.9665E−03
8.4708E−09
9.5669E−01
3.3481E−06

Verification set
2.1551E−02
3.5181E−08
1.2384E−02
3.5083E−08
4.2484E−01
9.8731E−05

Testing set
1.6189E−02
2.2474E−08
1.1226E−02
1.0102E−08
6.2491E−01
4.8607E−05

It may be seen from Table 1 and FIGS. 4A-4C that: 1) RF is superior to DFR in mean statistical results of RMSE, MAE and R2 during training, verification and testing, but inferior to DFR in stability. 2) DFR and DFR-clfc are close to RF in modeling accuracy, and better than RF in modeling stability, DFR-clfc is slightly higher than DFR in accuracy in the training set, the verification set and the testing set, but is inferior to DFR in stability. 3) BLS-NN has obvious over-fitting on the training data, and is worse in generalization performance and stability in the verification set and the testing set, indicating that BLS-NN is difficult to apply to small sample high-dimensional data in a real industrial process in the present application. 4) The BHFR is the best in mean statistical results of RMSE, MAE and R2 indicators in the testing set, and is next to DFR in stability, indicating that BHFR has desirable generalization performance and stability.

To sum up, the DXN soft sensor modeling experiment shows that the BHFR according to the present application is better than classic RF, DFR and improved version DFR-clfc in training and learning capacity, is superior to RF, DFR, DFR-clfc and BLS-NN in modeling accuracy and a data fitting degree in the testing set, and shows obvious advantages in constructing the DXN soft sensor model.

The broad hybrid forest regression (BHFR)-based soft sensor method for DXN emission in an MSWI process according to the present disclosure constructs the soft sensor model based on the BHFR, and the soft sensor model combines algorithms such as broad learning modeling, ensemble learning and latent feature extraction: 1) based on the BLS framework, the non-differential learner is used to construct the soft sensor model including the feature mapping layer, the latent feature extraction layer, the feature incremental layer and the incremental learning layer; 2) internal information of the BHFR model is processed through full information connection, latent feature extraction and mutual information measurement, so as to effectively ensure maximum transmission and minimum redundancy of internal feature information of the BHFR model; 3) the hybrid forest group is used as a mapping unit for incremental learning in a modeling process, and the weight matrix of the output layer is rapidly computed according to the pseudo-inverse strategy, and then the incremental learning is adaptively adjusted based on the convergence degree of the training error, thereby implementing high-precision soft sensor modeling. The validity and rationality of the method are verified by using a high-dimensional an industrial process DXN data set.

Specific examples are used herein to explain the principles and implementation modes of the present disclosure. The foregoing description of the embodiments is merely intended to help understand the method of the present disclosure and its core ideas. Besides, various modifications may be made by a person of ordinary skill in the art to specific embodiments and the scope of application in accordance with the ideas of the present disclosure. In conclusion, the content of the present specification shall not be construed as limitation to the present disclosure.

BROAD HYBRID FOREST REGRESSION (BHFR)-BASED SOFT SENSOR METHOD FOR DIOXIN (DXN) EMISSION IN MUNICIPAL SOLID WASTE INCINERATION (MSWI) PROCESS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS REFERENCE TO RELATED APPLICATION

PCT Information