The present invention pertains in general to creating networks and, more particularly, to a modeling approach for modeling a global network with a plurality of local networks utilizing an ensemble approach to create the global network by generalizing the outputs of the local networks.
In order to generate a model of a system for the purpose of utilizing that model in optimizing and/or controlling the operation of the system, it is necessary to generate a stored representation of that system wherein inputs generated in real time can be processed through the stored representation to provide on the output thereof a prediction of the operation of the system. Currently, a number of adaptive computational tools (nets by way of definition) exist for approximating multi-dimensional mappings with application in regression and classification tasks. Some such tools are nonlinear perceptrons, radial basis function (RBF) nets, projection pursuit nets, hinging hyper-planes, probablistic nets, random nets, high-order nets, multi-variate (multi-dimensional), adaptive regression splines (MARS) and wavelets, to name a few.
There are provided to each of these nets a multidimensional input for mapping through the stored representation to a lower dimensionality output. In order to define the stored representation, the model must be trained. Training of the model is typically tasked with a non-linear multi-variated optimization. With a large number of dimensions, a large volume of data is required to build an accurate model over the entire input space. Therefore, to accurately represent a system, a large amount of historical data needs to be collected, which is an expensive process, not to mention the fact that the processing of these larger historical data sets results in increasing computational problems. This is sometimes referred to as the “curse of dimensionality.” In the case of time-variable multidimensional data, this “curse of dimensionality” is intensified, because it requires more inputs for modeling. For systems where data is sparsely distributed about the entire input space, such that it is “clustered” in certain areas, a more difficult problem exists, in that there is insufficient data in certain areas of the input space to accurately represent the entire system. Therefore, the competence factor in results generated in the sparsely populated areas is low. For example, in power generation systems, there can be different operating ranges for the system. There could be a low load operation, intermediate load operation and a high load operation. Each of these operational modes results in a certain amount of data that is clustered about the portion of the space associated with that operating mode and does not extend to other operating loads. In fact, there are regions of the operating space where it is not practical or economical to operate the system, thus resulting in no data in those regions with which to train the model. To build a network that traverses all of the different regions of the input space requires a significant amount of computational complexity. Further, the time to train the network, especially with changing conditions, can be a difficult problem to solve.
The present invention disclosed and claimed herein, in one aspect thereof, comprises a predictive global model for modeling a system. The global model includes a plurality of local models, each having: an input layer for mapping the input space in the space of the inputs of the basis functions, a hidden layer and an output layer. The hidden layer stores a representation of the system that is trained on a set of historical data, wherein each of the local models is trained on only a select and different portion of the set of historical data. The output layer is operable for mapping the hidden layer to an associated local output layer of outputs, wherein the hidden layer is operable to map the input layer through the stored representation to the local output layer. A global output layer is provided for mapping the outputs of all of the local output layers to at least one global output, the global output layer generalizing the outputs of the local models across the stored representations therein.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying Drawings in which:
a and 7b illustrate a flow chart depicting the ensemble operation;
a illustrates a diagrammatic view of the optimization algorithm for the ARG;
b illustrates a plot of minimizing the numbers of nodes;
Referring now to
The data from the input space is input to a global network 106 which is operable to map the input data through a stored representation of the plant or operating system to provide a predicted output. This predicted output is then used in an application 108. This application could be a digital control system, an optimizer, etc.
The global network, as will be described in more detail herein below, is comprised of a plurality of local networks 110, each associated with one of the regions 104. Each local network 110, in this illustration, is comprised of a non-linear neural network. However, other types of networks could be utilized, linear or non-linear. Each of these networks 110 is initially operable to store a representation of the plant, but trained only on data from the associated region 104, and provide a predicted output therefrom. In order to provide this representation, each of the individual networks 110 is trained only on the historical data set associated with the associated region 104. Thereafter, when data is input thereto, each of the networks 110 will provide a prediction on the output thereof. Thus, when data is input to all of the networks 110 from the input space 102, each will provide a prediction. Also, as will be described herein below, each of the networks 110 can have a different structure.
The prediction outputs for each of the networks 110 are input to a global net combining block 112 which is operable to combine all of the outputs in a weighted manner to provide the output of the global net 106. This is an operation where the outputs of the networks 110 are “generalized” over all of the network 110. The weights associated with this global net combine block 112 are learned values which are trained in a manner that will be described in more detail herein below. It should be understood that when new input pattern arrives, the global net 106 predicts the corresponding output based on the data previously included in the training set. To do so, it temporarily include the new pattern in the closest cluster and obtains an associated local net output. With small time lag, the net will also obtain the actual local net output (not stable state one). Thereafter, substituting the attributes of all local nets into the formula for global net 106, the output of the global net 106 for a new pattern will be obtained. That completes the application for that instance. The next step is a recalculation step for recalculating the clustering parameters, retraining of the corresponding local net and the global net, and then proceeding on to the next new pattern. This will be described in more detail herein below with respect to
Referring now to
Prior to understanding the clustering algorithm, the description of each of the local networks will be provided. In this embodiment, each of the local networks is comprised of a neural network, this being a nonlinear network. The neural network is comprised of input layer 302 and the output layer 304 with a hidden layer 306 disposed there between. The input layer 302 is mapped through the hidden layer 306 to the output layer 304. The input is comprised of a vector x(t) which is a multi-dimensional input and the output is a vector y(t), which is a multi-dimensional output. Typically, the dimensionality of the output is significantly lower than that of the input.
Referring now to
The Ensemble Approach (EA)
In order to provide a more computational efficient learning algorithm for a neural network, an ensemble approach is utilized, which basically utilizes one approach for defining the basis functions in the hidden layer, which are a function of both the input values and internal parameters referred to as “weights,” and a second algorithm for training the mapping of the basis function to the output node 402. The EA is the algorithm for training one hidden layer nets of the following form:
where {tilde over (ƒ)}(x,W) is the output of the net (can be scalar, or vector, usually low dimensional), x is the multi-dimensional input, {wnext, n=0, 1, . . . Nmax} is the set of external parameters, {wnint, n=1, . . . Nmax} is the set of internal parameters, W is the set of net parameters, which include both the external and internal parameters, {φn, n=1, . . . Nmax} is the set of (nonlinear) basis functions, Nmax is the maximal number of nodes, dependent on the class of application, time and memory constraints. The external parameters can be either scalars or vectors, if the output is the scalar or vector respectively. The construction given by equation (1) is very general. Further for simplicity of notations it is assumed that there is only one output. In practice basis functions are implemented as superpositions of one-dimensional functions in the following equation:
The following will provide a general description of the EA. The EA builds and keeps in memory all nets with the number of hidden nodes N, 0≦N≦Nmax, noting that each of the local nets can have a different number ofhidden nodes associated therewith. However, since all of the local nets model the overall system and are mapped from the same input space, they will have the same inputs and, thus, substantially the same level of dimensionality between the inputs and the hidden layer.
Denote the historical data set as:
E={(xp,yp),p=1, . . . P}, (003)
where “p” denotes the pattern and (xp,yp) is an input-output pair connected by an unknown functional relationship yp=ƒ(xp)+εp, where εp is a stochastic process (“noise”) with zero mean value, unknown variance σ, and independent εp, p=1, . . . P. The data set is first divided at random into three subsets (Et Eg and Ev), as follows:
Et={(xpt,ypt),p=1, . . . Pt}, Eg={(xpg,ypg),p=1, . . . Pg}, (004)
and:
Ev={(xpv,ypv),p=1, . . . Pv} (005)
for training, testing (generalization), and validation, respectively. The union of the training set Et and the generalization set Eg will be called the learning set E1. The procedure of randomly dividing a set E into two parts E1 and E2 with probability p is denoted as divide (E, E1, E2, p), where each pattern from E goes to E1 with probability p, and to E2=E−E1 with probability 1−p. This procedure is first applied to divide the data set into training and validation sets, and sending data to the validation set with a probability of 0.03, therefore calling divide (E, E1, Ev, 0.97). Thus, the learning data is divided into sets for training and generalization by calling divide (E1, Et, Eg, 0.75). The data set for validation is never used for learning and used only for checking after learning is completed. For validation purposes only, roughly 3% of the total data is used. The remaining learning data is divided so that roughly 75% of learning data goes to the training set while 25% is left for testing. Training data is completely used for training. The testing set is used after training is completed, for each of the nets with N, 0≦N≦Nmax nodes, to calculate a set of testing errors, testMSEN, for 0≦N≦Nmax, A special procedure optNumberNodes (testMSE) uses the set of testing errors to determine the optimal number of nodes for each local net, which will be described herein below. This procedure finds the global minimum of testMSEN over N, 0≦N≦Nmax. (As will be described herein below with reference to
The algorithm for finding the number of nodes is as follows:
The value of N satisfying the above inequality is called the optimal number of nodes and is denoted as N*. Two cases are shown in
The default value of the parameter Percent equals 20. This procedure will tolerate some increase in the minimal testing error in order to obtain a shorter net (with lesser number of nodes). This is an algorithmic solution for the number of local net nodes. Another aspect of the training algorithm associated with the EA is training with noise. Originally noise was added to the training output data before the start of training in the form of artificially simulated Gaussian noise with the variance equal to the variance of the output in the training set. This added noise is multiplied by a variable Factor, manually adjusted for the area of applications to the default value 0.25. Increase of the factor will decrease net performance on the training data while causing an increase of performance on the future prediction.
For a more detailed description of the training, a diagrammatic view of how the network is trained may be more appropriate. With further reference to
In the ensemble approach, the Adaptive Stochastic Optimization (ASO) technique intertwines with the second algorithm, a Recursive Linear Regression (RLR) algorithm, comprising the basic recursive step of the learning procedure: building the trained and tested net with (N+1) hidden nodes from the previously trained and tested net with N hidden nodes (in the rest of this paragraph the word “hidden” will be omitted). The ASO, freezes the nodes φ1, . . . φN, which means keeping frozen their internal vector weights w1 . . . , wN, and then generates the ensemble of candidates in the node φN+1, which means generating the ensemble of their internal vector weights {wN+1}. The typical size of the ensemble is in the range 50-200 members. The ASO goes through the ensemble of internal vector-weights to find, in the end of the ensemble, its member w*,N+1, which together with the frozen w1, . . . , wN gives the net with N+1 nodes. This net is the best among all members in the ensemble of nets with N+1 nodes, which means the net with minimal testing error. The weight w*,N+1 becomes new weight wN+1 and the procedure for choosing all internal weights for a training net with (N+1) nodes has been completed. So far, this discussion has been focused on the ASO and on the procedure for choosing internal weights. However, the calculation of the training error requires, first of all, building a net, which requires calculating the set of external parameters wext0, wext1, . . . , wextN+1. These external parameters are determined utilizing the RLR for each member of the ensemble. The RLR also includes the calculation of the net training error.
From the standpoint of the ASO function, prior to a detailed explanation herein below, this is an operation where a specially constructed Adaptive Random Generator (ARD) generates the ensemble of randomly chosen internal vector weights (samples). The first member of the ensemble is generated according to a flat probability density function. If the training error of a net with (N+1) nodes, corresponding to the next member of the ensemble, is less than the currently achieved minimal training error, then the ARD changes the probability density function utilizing this information.
With reference to
As was noted in the beginning of this section, EA builds a set of nets, each with N nodes, 0≦N≦Nmax. This process starts with N=0. For this case the net output is a constant, which optimal value can be calculated directly as
For the purpose of further discussion of the EA the design PN and its pseudo-inverse PN+ matrices for the net with arbitrary N nodes is defined as:
In equation 009 the bold font is used for vectors in order not to confuse, for example, the multi-dimensional input x1 with its one-dimensional component x1. The matrix PN is the Pt×(N+1) matrix (Pt rows and N+1 columns). It can be noticed that if matrix PN is known, then matrix PN+1 can be obtained by the recurrent equation:
The matrix PN+ is the (N+1)×Pt matrix and has some properties of the inverse matrix (the inverse matrices are defined only for quadratic matrices, the pseudo-inverse PN+ is not quadratic because in right designed net should be N<<Pt). It can be calculated by the following recurrent equation:
where:
PN+1=[φN+1(x1,wN+1), . . . φN+1(xP
In order to start using equations (010)-(013) for recurrent calculation of matrices PN+1 and PN+1,+ through matrices PN and PN+ the initial conditions are defined as:
Then the equations (010)-(013) are applied in the following order for N=0. First the one-column matrix p1 is calculated by equation (012). Then the matrix P0 and the matrix p1 are used in equation (010) to calculate the matrix P1. After that equation (013) calculates the one-column matrix k1, using P0, P0+ and p1. Finally equation (011) calculates the matrix P1+. That completes calculation of P1 and P1+ using P0 and P0+. This process is further used for calculation of matrices PN and PN+ for 2≦N≦Nmax.
It can be seen that for any N the matrices PN and PN+ satisfy the equation:
PN+PN=IN+1, (015)
where IN+1 is the (N+1)×(N+1) unit matrix. At the same time the matrix PNPN+ is the matrix which projects any P1-dimensional vector on the linear subspace spanned by the vectors p0, p1, . . . pN. That justifies the following equations:
wext=PN+yt,{tilde over (y)}t=PNwext, (016)
where:
Equations (010)-(013) describe the procedure of Recursive Linear Regression (RLR), which eventually provides net outputs for all local nets with N nodes, therefore allowing for calculation of training MSE by equation (017):
After each calculation of the eN,t the generalization (testing) error eN,g, N=0, 1, . . . Nmax is calculated by the equation (018)
where:
{tilde over (y)}g=[{tilde over (ƒ)}N(x1g,WN), . . . {tilde over (ƒ)}N(xP
It should be noted that the values of testing net outputs are calculated not by equations (010)-(016) but by the equation (001), which in this case looks like equations (020) and (021):
where WN is the set of trained net parameters for a net with N nodes
WN={wnext,n=0,1, . . . N,wmint,m=1, . . . N}, (021)
After the process of training comes to the end with a net with N=Nmax the procedureoptNumberNodes(testMSE) calculates the optimal number of nodes N,≦Nmax and select the only optimal net with optimal number of nodes and corresponding set of the net parameters.
Adaptive Stochastic Optimization (ASO)
As noted hereinabove, the RLR operation is utilized to train the weights between the hidden nodes 502 and the output node 508. However, the ASO is utilized to train internal weights for the basis function to define the mapping between the input nodes 504 and hidden nodes 502. Since this is a higher dimensionality problem, the ASO solves this through a random search operation, as was described hereinabove with respect to
wN+1int=(wN+1,iint,i=1, . . . d) (022)
and the related ensemble of nets {tilde over (ƒ)}N+1. The number of members in the ensemble equals to numEnsmbl=Phase1+Phase2, where the Phase1 is the number of members in Phase1 of the ensemble, while the Phase2 is the number of members in Phase2. The default values of these parameters are Phase1=25, Phase2=75. Other values of the internal parameters w1int, . . . wNint for building the nets {tilde over (ƒ)}N+1 are kept from the previous step of building the net {tilde over (ƒ)}N. This methodology of optimization is based on the literature, which says that asymptotically the training error obtained by optimization of internal parameters of the last node is of the same order as the training error obtained by optimization of all net parameters. That is why the internal parameters from the previous step of the RLR are not changed but the set of external parameters completely recalculated and optimized with the RLR.
Thus, by keeping the optimal values of the internal parameters w1int, . . . wNint from the previous step of building the optimal net with N nodes results in the creation of the ensemble of numEnsmbl possible values of the parameter wN+1int by generating a sequence of all one-dimensional components of this parameter, wN+1,iint, i=1, . . . d, using an Adaptive Random Generator (ARG) for each component.
Referring now to
Referring now to
Each of the local networks, as described hereinabove, can have a different number of hidden nodes. As the ASO algorithm progresses, each node will have the weights there of associated with the basis function determined and fixed and then the output node will be determined by the RLR algorithm. Initially, the network is configured with a single hidden node and the network is optimized with that single hidden node. When the minimum weight is determined for the basis function of that single hidden node then the entire procedure is repeated with two nodes and so on. (It may be that the algorithm starts with more than a single hidden node.) For this single hidden node, there may a plurality of input nodes, which is typically the case. Thus, the above noted procedure with respect to
In
As a summary, the RLR and ASO procedures operate as follows. Suppose the final net consisting of the N nodes has been built. It consists of N basis functions, each determined by its own multidimensional parameter wintn, n=1, . . . , N connected in a linear net by external parameters wextn, n=0, 1, . . . , N The process of training and testing basically consists of building a set of nets with 0, N= . . . , Nmax nodes. The initialization of the process starts typically with N=0 and then goes recursively from N to N+1 until reaching N=Nmax. Now the organization of the main step N→N+1 will be described. First the connections between first N nodes, provided by the external parameters, are canceled, while nodes 1, 2, . . . , N determined by their internal parameters remain frozen from the previous recursive step. Secondly to pick up a good (N+1)-th node, the ensemble of these nodes is generated. Each member of the ensemble is determined by its own internal multidimensional parameter wintN+1 and is generated by a specially constructed random generator. After each of these internal parameters is generated, there is provided a set of (N+1) nodes which set can be combined in a net with (N+1) nodes calculating the external parameters wextn, n=0, 1, . . . , N+1. This procedure of recalculating of all external parameters is not conventional but attributed to the Ensemble Approach. The conventional asymptotic result described herein above requires only calculating one external parameter wextN+1. Calculating all external parameters is performed by a sequence of a few matrix algebra formulas called RLR. After these calculations are made for a given member of the ensemble, the training MSE can be calculated. The ASO provides the intelligent organization of the ensemble so that the search for the best net in the ensemble (with minimum training MSE) will be the most efficient. The most difficult problem in multidimensional optimization (which is the task of training) is the existence of many local minima in the objective function (training MSE). The essence of ASO is that the random search is organized so that as the size of ensemble increases the number of the local minima decreases and approaches one when the size of the ensemble approaches infinity. In the end of the ensemble, the net with minimal training error in the ensemble will be found, and only this net goes to the next step (N+1)→(N+2). Only for this best net with (N+1) nodes will the testing error be calculated. When N reaches Nmax, the whole set of best nets with N nodes, 0≦N≦Nmax nodes with their internal and external parameters will have been calculated. Then the procedure described in the herein above finds among this set of nets the only one with optimal number of nodes N*, which means the net with minimal testing error.
Returning to the ASO procedure, it should be understood that random sampling of the internal parameter with its one-dimensional components means that random generator is applied subsequently to each component and only after that the process goes further.
Clustering
The ensemble net operation is based upon the clustering of data (both inputs and outputs) in a number of clusters.
The clustering algorithm that is utilized is the modified BIMSEC (basic iterative mean squared error clustering) algorithm. This algorithm is a sequential version of the well known K-Means algorithm. This algorithm is chosen, first, since it can be easily updated for new incoming data and, second, since it contains an explicit objective function for optimization. One deficiency of this algorithm is that it has a high sensitivity to initial assignment of clusters, which can be overcome utilizing initialization techniques which are well known. In the initialization step, a random sample of data is generated (the size of the sample equal to 0.1*(size of the set) was chosen in all examples). The first two cluster centers are chosen as a pair of generated patterns with the largest distance between them. For example, if n≧2 clusters are chosen, the following iterative procedure will be applied. For each remaining pattern x in the sample, the minimal distance dn(x) to these cluster centers is determined. The pattern with the largest dn(x) has been chosen as the next, (n+1)-th cluster.
The standard BIMSEC algorithm minimizes the following objective:
where c is the number of clusters, mi is the center of the cluster Di, I=1, . . . c. To control the size of clusters another objective has been added:
where n is the total number of patterns. Thus, the second objective is to keep the distribution of cluster sizes as close as possible to the uniform. The total goal of clustering is to minimize the following objective:
where λ and μ are nonnegative weighting coefficients satisfying the condition λ+μ=1. The proper weighting depends on the knowledge of the values of Je and Ju. A dynamic updating of λ and μ has been implemented by the following scheme. The total number of iterations is N/M. Suppose it is desired to keep λ=a, μ=1−a, 0≦a≦1. Then in the end of each group s, s≧1 the updating of λ and μ is made by the equation:
λ=a,μ=(1−a)Jes/Jus≧Jes
λ=aJus/Jes,μ=1−a if Jus<Jes. (026)
The clustering algorithm is shown schematically below.
Building Local Nets
The previous step, clustering, starts with normalizing the whole set of data assigned for learning. In building local nets, the data of each cluster is renormalized using local data minimal and maximal values of each one-dimensional input component. This locally normalized data is then utilized by the EA in building a set of local nets, one local net for each cluster. After training, the number of nodes for each of the trained local nets is optimized using the procedure optNumberNodes (testMSE) described hereinabove. Thus, in the following steps only these nets, uniquely selected by the criterion of test error from the sets of all trained local nets with the number of nodes N, 0≦N≦Nmax, are utilized, in particular, as the elements of the global net.
Building Global Net and Predicting New Pattern
After the local nets have been defined, it is then necessary to generalize these to provide a general output over the entire input space, i.e., the global net must be defined.
Denote the set of trained local nets described in the previous subsection as:
Nj(x),j=1, . . . C, (027)
where Nj(x) is the trained local net for a cluster Dj, C being the number of clusters. The default value of C is C=10 for a data set with the number of patterns P, 1000≦P≦5000, or C=5 for a data set with 300≦P≦500. For 500<P<1000 the default value of C can be calculated by linear interpolation C=5+(P−500)/100.
The global net N(x) is defined as:
where the parameters cj, j=1, . . . C are adjustable on the total training set and comprise the global net weights. In order to train the network (the local nets already having been trained), the training data must be processed through the overall network in order to train the value of cj. In order to train this net, data from the training set is utilized, it being noted that some of this data may be scattered. Therefore, it is necessary to determine to which of the local nets the data belongs such that a determination can be made as to which network has possession thereof.
For an arbitrary input pattern from the training set x=xk, the value of Ñj(x) is defined as:
temp=∥xk−mj∥/(0.01*dLessIntrajIntraj), (030)
Intraj and dLessIntraj are the clustering parameters. The parameter Intraj is defined as the shortest distance between the center mj of the cluster Dj and a pattern from the training set outside this cluster. The parameter dLessIntraj is defined as the number of patterns from the cluster Dj having distance less than Intraj expressed in percents of the cluster size. Thus, the global net is defined for the elements of the training set. For any other input pattern first the cluster having minimum distance from its center to the pattern is determined. Then the input pattern is declared temporarily as the element of this cluster and equations (029) and (030) can be applied to this pattern as an element of the training set for calculation of the global net output. The target value of the plant output is assumed to become known by the moment of appearance of the next new pattern or a few seconds before that moment.
Retraining Local Nets
Referring now to
Training/Retraining the Global Net
Referring now to
Each of the outputs from the local nets for each of the patterns constitutes a new predicted pattern which is referred to as a “Z-value” which is a predicted output value for a given pattern, defined as z=Ñj(x). Therefore, for each pattern, there will be an historical input value and a predicted output value for each net. If there are 100 networks, then there will be 100 Z-values for each pattern and these are stored in a memory 1506 during the training operation of the global net. These will be used for the later retraining operation. During training of the global net, all that is necessary is to output the stored z values for the input training data and then input to the output layer of the global net the associated (yt) value for the purpose of training the global weights, represented by weights 1508. As noted hereinabove, this is trained utilizing the RLR algorithm. During this training, the input values of each pattern are input and compared to the target output (yt) associated with that particular pattern, an error generated and then the training operation continued. It is noted that, since the local nets 1502 are already trained, this then becomes a linear network.
For a retraining operation wherein a new pattern is received, it is only necessary for one local net 1502 to be trained, since the input pattern will only reside in a single one of the clusters associated with only a single one of the local networks 1502. To maintain computational efficiency, it is only necessary to retrain that network and, therefore, it is only necessary to generate a new output from that retrained local net 1502 for generation of output values, since the output values for all of the training patterns for the unmodified local nets 1502 are already stored in the memory 1506. Therefore, for each input pattern, only one local network, the modified one, is required to calculate a new Z-value, and the other Z-values for the other local nets are just fetched from the memory 1506 and then the weights 1508 are trained.
Referring now to
Referring now to
Referring now to
The measured NOx output and the MVs and DVs are input to a controller 1816 which also provides an optimizer operation. This is utilized in a feedback mode, in one embodiment, to receive various desired values and then to optimize the operation of the plant by predicting a future control input value MV(t+1) that will change the values of the manipulatable variables. This optimization is performed in view of various constraints such that the desired value can be achieved through the use of the neural network model. The measured NOx is utilized typically as a bias adjust such that the prediction provided by the neural network can be compared to the actual measured value to determine if there is any error between the prediction provided by the neural network. The neural network utilizes the globally generalized ensemble model which is comprised of a plurality of locally trained local nets with a generalized global network for combining the outputs thereof to provide a single global output (noting that more than one output can be provided by the overall neural network).
Referring now to
When the plant consists of a power generation unit, there are a number of parameters that are controllable. The controllable parameters can be NOx output, CO output, steam reheat temperature, boiler efficiency, opacity an/or heat rate.
It will be appreciated by those skilled in the art having the benefit of this disclosure that this invention provides a non linear network representation of a system utilizing a plurality of local nets trained on select portions of an input space and then generalized over all of the local nets to provide a generalized output. It should be understood that the drawings and detailed description herein are to be regarded in an illustrative rather than a restrictive manner, and are not intended to limit the invention to the particular forms and examples disclosed. On the contrary, the invention includes any further modifications, changes, rearrangements, substitutions, alternatives, design choices, and embodiments apparent to those of ordinary skill in the art, without departing from the spirit and scope of this invention, as defined by the following claims. Thus, it is intended that the following claims be interpreted to embrace all such further modifications, changes, rearrangements, substitutions, alternatives, design choices, and embodiments.
This application is related to U.S. patent application Ser. No. 10/982,139, filed Nov. 4, 2004, entitled “NON-LINEAR MODEL WITH DISTURBANCE REJECTION,” (Atty, Dkt. No. PEGT-26,907), which is incorporated herein by reference.