1. Field of the Invention
The present invention relates to a data collection method for a process margin monitoring system of industrial equipment and a storage medium for storing the same, and more particularly to a data collection method for a process margin monitoring system of industrial equipment that is capable of collecting learning data from a database of a computer in a power plant and converting the data into a form in which the data can be easily learned in realizing a monitoring system for analyzing process margin of industrial equipment based on a statistical learning method and a storage medium for storing the same.
2. Description of the Related Art
Industrial equipment includes a plurality of systems and instruments for achieving a specific purpose. Generally, one or more measuring instruments for confirming an operation and safety state of the industrial equipment are installed such that the operation and safety state of the industrial equipment can be measured offline or online.
Efficiency and safety of the industrial equipment are changed depending upon external conditions (temperature, pressure, or humidity of the atmosphere; temperature of seawater or rainfall in a case in which a coolant is needed), characteristics of fuel supplied to the industrial equipment, a degradation degree of the industrial equipment, and an operation range. In terms of cost, a change range in which the efficiency and safety of the industrial equipment are maintained is called process margin. Most industrial equipment has a stoppage/protection function for stopping/protecting a specific system or instrument in order to prevent the operation of the industrial equipment exceeding such process margin. In order to realize such a stoppage/protection function, a control device is provided for forcibly stopping the industrial equipment if a value of a specific operation variable exceeds a set value for stoppage/protection.
The process margin and the set value for stoppage/protection are interdependent variables. If the set value for stoppage/protection is set to too large a value, the process margin is relatively increased, and therefore, cost benefit obtained by operating the industrial equipment is increased; however, serious accidents may occur with the result that the industrial equipment may be stopped for a long period of time. On the other hand, if the set value for stoppage/protection is set to a too small value, probability of accident occurrence is lowered; however, the process margin is decreased with the result that the industrial equipment may frequently be stopped, and therefore, cost benefit obtained by operating the industrial equipment is decreased.
Therefore, both of these facets should be considered when deciding overall process margin. As high degree of safety is required, a conservative value, including external conditions, supplied fuel, a degradation degree of the industrial equipment, and an operation range, is generally used to decide process margin.
However, it is very difficult to decide overall process margin in various situations, such as external conditions, supplied fuel, a degradation degree of the industrial equipment, and an operation range.
On the other hand, a set value for preliminary stoppage/protection is generally provided so that an operator can prepare for the stoppage of the industrial equipment or can take proper measures to normalize the industrial equipment before the value of the specific operation variable reaches the set value for stoppage/protection.
However, such a set value for preliminary stoppage/protection is generally a static value. The value is not changed once the value is set. Although the value is changed, the set value is set as a function with respect to one or two conditions indicating characteristics of the industrial equipment.
If process is within the above set value for stoppage/protection, therefore, it cannot be determined whether the process is really normal or abnormal. Also, it is difficult to expect time during which a process problem is transmitted to the set value. For this reason, it is impossible to take a proper measure until a very tense situation is caused.
Technology has been well known that is capable of performing dynamic monitoring and issuing a timely alarm with respect to a stoppage/protection signal of the industrial equipment based on a series of statistical learning and prediction models in order to solve the above-mentioned conventional problems.
Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a data collection method for a process margin monitoring system of industrial equipment that is capable of collecting learning data from a database of a computer in a power plant and converting the data into a form in which the data can be easily learned in realizing a monitoring system for analyzing process margin of industrial equipment based on a statistical learning method and a storage medium for storing the same.
In accordance with an aspect of the present invention, the above and other objects can be accomplished by the provision of a data collection method for a process margin monitoring system of industrial equipment, including preparing a learning data set based on data determined to be normal in an operation history of the industrial equipment so that the learning data set is sorted for each operation mode, in a case in which the industrial equipment includes a plurality of equipment units performing the same functions, receiving data for each of the equipment units and processing the received data as data for the equipment units, sorting and grouping associated ones of the data in the learning data set, and sampling the collected data to reduce the number of data.
The learning data set may include a first data set to an N-th data set (N being a natural number equal to or greater than 2) depending upon a scale of data to be collected or time when data are collected.
The first data set may include signals related to a specific equipment unit of the industrial equipment for monitoring process margin of the specific equipment unit, the second data set may include signals related to the entirety of the industrial equipment for monitoring process margin of the entirety of the industrial equipment, and the third data set may include signals regarding the entirety or a portion of the industrial equipment immediately after a specific event is generated in the entirety or the portion of the industrial equipment.
The data collection method may further includes, in a case in which the learning data set comprises data displayed as digital signals, collecting analog signal that can substitute for the digital signal and converting the digital signal into the analog signal.
The grouping step may include regarding variables, a correlation coefficient between which is equal to or greater to a set value, as belonging to the same group, calculating a smoothness parameter with respect to the variables regarded as belonging to the same group using a 4-fold validation method, putting combinations of all variables in the group besides the variables regarded as belonging to the same group to calculate a square sum of residuals while calculating the smoothness parameter using the 4-fold validation method, and, in a case in which a decrease ratio of a square sum of residuals immediately after a square sum of specific residuals to the square sum of specific residuals is equal to or less than a set value, terminating grouping at a time when the square sum of specific residuals is calculated.
The step of calculating the square sum of residuals may include sorting and using only variables related to characteristics of the equipment among the variables besides the variables regarded as belonging to the same group in consideration of characteristics of the equipment.
The correlation coefficient may be analyzed by the following mathematical expression.
Where, ρXY indicates a correlation coefficient between variables X and Y, Xi indicates an i-th value on the basis of a sampling section of learning data, Yi indicates an i-th value on the basis of a sampling section of learning data (Y is a variable different than X), μX indicates the average of a variable X, μY indicates the average of a variable Y, σX indicates standard deviation of a variable X, σY indicates standard deviation of a variable Y, and N indicates the number of data collection intervals in a sampling section of learning data.
The data sampling step may include performing dispersion of a value of a specific variable on the basis of a grid size to reduce the number of data related to the variable in a corresponding grid.
The data sampling step may include calculating standard deviation (σX) of a value of a specific variable and reducing the number of data related to the variable in a corresponding grid on the basis of a grid size (GridSizeX) calculated by the following mathematical expression according to set resolution.
The number of data left in the grid may be decided by the product of the number of data related to the variable in the corresponding grid and a set rate, and at least one of the data is left in each grid.
The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Now, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so as to explain the present invention in detail to such an extent that a person having ordinary skill in the art to which the present invention pertains can easily make the present invention. The object, operation, and effects, and, in addition, other objects, features, and operational advantages of the present invention will be more clearly understood from the following detailed description.
For reference, embodiments disclosed in this specification are selected from several possible embodiments and presented as the most preferred embodiments to assist those skilled in the art to understand the present invention. Therefore, the technical concept of the present invention is not restricted or limited to the disclosed embodiments, and it should be understood that various modifications, additions and substitutions are possible, and, in addition, equivalents thereof are also possible, without departing from the technical concept of the present invention.
A process margin monitoring system for issuing a timely alarm about process margin based on a statistical learning and prediction model has been developed. The process margin monitoring system is characterized by distinguishing between errors of a measuring instrument and abnormality of equipment using statistical data (hereinafter, referred to as “learning data”) obtained from an operation history of the equipment.
Accuracy of the process margin monitoring system depends upon how reliably learning data are collected from the operation history of the equipment and how the collected learning data are grouped so that the learning data can be used to construct a prediction model.
Conditions required to improve accuracy of the process margin monitoring system may be divided into the following detailed items.
(1) How to Collect Data
This is a method of selecting time when collection of learning data from a database installed in a computer of a power plant is started and time when collection of learning data from the database is ended.
(2) How to Collect Data in a Case in which Power Generation Equipment is Normally Operated and in a Case in which the Power Generation Equipment is not Normally Operated
A normal state means that the equipment is maintained in a stable state without change of operation conditions.
Generally, data collected at that time are useful to construct a statistical model. On the other hand, data obtained when the state of the power generation equipment is changed due to start, stoppage, or various control logics are not useful to construct a statistical model. For this reason, it is necessary to provide a method of collecting data from the database installed in the computer of the power plant while distinguishing between a normal state and an abnormal state and inputting the collected data to the process margin monitoring system.
(3) How to Collect Analog Data and Digital Data
Unlike an analog signal indicating a general process signal, a digital signal for mainly indicating operation states of equipment, such as an open or closed state of a valve and an operation/stoppage state of a pump, plays an important role, but a problem occur when the digital signal is reflected in a statistical learning model developed based on analog data. For this reason, it is necessary to provide a method of receiving digital data from the database installed in the computer of the power plant and inputting the received digital data to the process margin monitoring system.
(4) How to Process Data Having the Same Characteristics Provided by a Plurality of Equipment Units
In many cases, an industrial equipment unit for performing an important function has one or more backup equipment units that are capable of performing the same function. For example, in a case in which several pumps are operated while another pump remains stopped, and one of the pumps under operation is stopped for a certain reason, the pump remaining stopped is operated to substitute for the failed pump. In this case, the total number of equipment units that are operated is not changed, and therefore, the operation condition is not changed. In providing a user with monitoring results, however, a portion to be changed is generated since there is a change in the equipment units under operation. That is, it is necessary to provide a method of receiving data having the same characteristics provided by a plurality of equipment units from the database installed in the computer of the power plant, processing the received data, and inputting the processed data to the process margin monitoring system.
(5) How to Select an Optimal Combination in Grouping Data
A signal list for monitoring the power generation equipment is generally enormous. Such a signal list includes not only signals important to confirm process margin of the equipment but also unnecessary signals. The simplest grouping method is confirming a correlation coefficient between signals and grouping signals having high correlation. However, grouping results may not be consistent depending upon a collection policy of learning data. Therefore, it is necessary to provide a method of grouping data based on a statistical method and engineering knowledge of equipment and inputting the grouped data to the process margin monitoring system.
(6) How to Reduce Collected Data to Such an Extent that Learning is Really Possible
Generally, if a sampling interval is very short although data are collected for a short period of time, the amount of the collected data is enormous. Also, for large-sized power generation equipment, a signal list to be monitored is very large. For this reason, it is not easy to process a huge amount of calculation necessary to construct a statistical learning model although a high-performance computer is used. Therefore, it is necessary to provide a method of reducing collected data with the minimum loss so that the data can be really learned and inputting the reduced data to the process margin monitoring system.
Hereinafter, methods of satisfying conditions required to improve accuracy of the process margin monitoring system will be described in detail according to the respective detailed items thereof. (1) Collection of Data (Construction of Multiple Learning Data Sets)
Ideal learning data must be obtained from operation conditions of normal equipment having no deterioration with time and no lowering of efficiency. Also, such ideal learning data must include operation data based on the combination of all external conditions (temperature, pressure, or humidity of the atmosphere; temperature of seawater or rainfall in a case in which a coolant is needed) and all internal conditions (characteristics of supplied fuel or an operation range). Since it is impossible to perfectly collect such data in actuality, however, learning data are prepared using the following method.
First, two or more learning data sets are constructed. Since learning data function as a reference target which is compared with a present equipment state, multiple learning data sets may be constructed correspondingly. Consequently, the learning data sets may include a first data set, a second data set, a third data set . . . , and an N-th data set (N being a natural number) depending upon the scale of data to be collected or the time when data are collected.
On the assumption that three learning data sets are constructed as shown in
A statistical learning method is divided into a learning mode and an execution mode. Each of the multiple learning data sets is used to generate a model in the learning mode, and provides a proper interface, by which a user can select one of the multiple learning data sets when the execution mode is commenced.
(2) Collection Data in a Case in which the Equipment is Normally Operated and in a Case in which the Equipment is not Normally Operated (Collection of Learning Data for each Operation Mode)
For most equipment, the equipment is started from a state in which the equipment is stopped, an operation state of the equipment is maintained in a predetermined state, and then the equipment is stopped after a predetermined time.
Consequently, the mode of the equipment may be divided into a start mode, a normal operation mode, and a stop mode. According to circumstances, the operation mode may be subdivided. When collecting learning data, data sets are sorted on a per operation mode basis. In a case in which data are sorted for each of the operation modes, grouping reliability is increased, and a model is simplified, whereby accuracy of the overall monitoring system is improved. Consequently, learning data are sorted and collected for each operation mode using the multiple learning data selection method described in paragraph (1).
That is, a model suitable for a corresponding operation mode is used in the execution mode. In a case in which monitoring is performed only in a specific operation mode, such monitoring is performed only when data obtained in an operation condition not exceeding a data range prepared in the learning mode are input. In a case in which the state of the system is not different than the above condition, an alarm indicating that reliability of the output result is low is issued to a user, or a calculation is automatically bypassed.
(3) Collection of Analog Data and Digital Data
If modeling is difficult in using the statistical learning method in a case in which learning data include a digital signal, learning data may be collected using an analog signal that can substitute for the digital signal. For example, if modeling of a digital signal indicating an open or closed state of a valve is difficult, flow rate, pressure, or temperature at a pipe located downstream of the valve is included in the learning data so that the open or closed state of the valve can be indirectly known.
If kernel regression analysis is used as a model of the learning data, analog data and digital data may be mixed. Also, important digital data must be designated as the same group as the learning data. In a grouping method based only on a linear correlation coefficient used in the existing statistical learning method, important digital data may be lost during grouping. For this reason, a method of finding an optimal grouping combination, which will be described below, must be utilized.
In the execution mode, however, the result of a digital signal may be an intermediate value or a value deviating from 0 or 1 as well as 0 or 1. In this case, it is determined that indication of opening/closing or stop/operation that the digital signal means may be incorrect.
(4) Processing of Data Having the Same Characteristics Provided by a Plurality of Equipment Units (Creation of Imaginary Analog/Digital Tags)
Learning data are not collected on an equipment basis but on a function basis. In a case in which data having the same characteristics are provided by a plurality of equipment units, therefore, imaginary tags are given. In order to give such imaginary tags, it is assumed that three of the four pumps 4a, 4b, 4c, and 4d are operated, and the remaining one is stopped so that it can be operated in case of emergency, as shown in
A concept of such an imaginary tag may be used to indicate a position at which a measuring instrument is not really installed although a signal is required, a position at which such a measuring instrument cannot be installed, or a physical amount that can be measured. For example, if it is wished to utilize enthalpy as a signal at the points H1 to H4 of
(5) Selection of an Optimal Combination in Grouping Data (Stepwise Variable Selection and Cross Grouping)
In order to improve accuracy of grouping, various kinds of singularity included in learning data must be basically removed. Representative examples of singularity may include a case in which data are not input, such as ‘Bad input’ and a case in which data are input but are large or small to such an extent that the data temporarily deviate excessively from a normal range, such as ‘Out of range.’ In a case in which data having such singularity are generated, data of all variables acquired at that time are simultaneously removed to improve reliability of learning data. All variables having no change during sampling of the learning data are processed as ‘Bad input’ so that the variables cannot function as noise in modeling.
Learning data include information useful to inform a user of the state of a specific equipment unit and information useless to inform the state of a specific equipment unit. Also, all signals do not indicate states of all of the equipment units in the system although the signals include useful information. For this reason, it is necessary to group signals including information useful to inspect a state of each of the equipment units to be inspected. When the grouping is performed as described above, it is possible to remove signals including useless information from the learning data, thereby reducing the number of signals necessary to monitor a specific equipment unit to an appropriate level.
Generally, a correlation coefficient used as a basis of grouping in the statistical learning method is analyzed with respect to all variable pairs constituting learning data, and is calculated as represented by the following mathematical expression 1. If the calculated value of the correlation coefficient is equal to or greater than a set value, the variables are regarded as the learning data. On the other hand, if the calculated value of the correlation coefficient is less than the predetermined value, the variables are excluded from the learning data. The set value is input by a user.
Where, ρXY indicates a correlation coefficient between variables X and Y, Xi indicates an i-th value on the basis of a sampling section of learning data, Yi indicates an i-th value on the basis of a sampling section of learning data (Y is a variable different than X), μX indicates the average of a variable X, μY indicates the average of a variable Y, σX indicates standard deviation of a variable X, σY indicates standard deviation of a variable Y, and N indicates the number of data collection intervals in a sampling section of learning data.
However, grouping depending on the correlation coefficient as described above has the following two problems.
First, a correlation coefficient between variables which should have a physical relationship is very low with the result that the variables may not belong to the same group. A correlation coefficient indicates a linear relationship between two variables. However, linearity of two certain variables may be differently analyzed depending upon a sampling period of learning data. For example, variables, such as an outside air condition, a seawater or rainfall condition, and a fuel condition, affect overall performance of the power generation equipment but are not sufficiently reflected in the correlation coefficient since such variables change much more slowly than a process change of the equipment. Such variables may be regarded as independent variables of the overall system. That is, change of the system does not affect such variables, but such variables affect change of the system.
Second, if such variables belong to a specific group, the variables cannot belong to other groups. Since independent variables of the system affect all groups, it is necessary for a plurality of groups to have the independent variables jointly.
Consequently, a stepwise variable selection method is suggested as follows in order to more precisely construct grouping.
The second problem is automatically solved using the stepwise variable selection method as described above. Stepwise variable selection results and cross variable grouping results are shown in
(6) A Method of Reducing Collected Data to Such an Extent that Learning is Actually Possible
Learning data that can be actually collected are too much to be analyzed by the latest computer. In this case, a huge amount of time is necessary for stepwise variable selection and cross grouping of {circle around (5)}.
In order to solve this problem, dispersion of a signal is performed on the basis of a grid size, and a method of reducing the number of data in corresponding data is suggested as follows. First, dispersion of a value of a specific variable is calculated, and the calculated dispersion is set as a reference grid size. A user may set the reference grid size to be large or small. Next, a grid is set for each variable, and real data are dotted in each grid.
For a system having two variables, in a case in which variables are divided into grids having a predetermined resolution for each variable, and duplicated data are removed from one grid, if the duplicated data are present in the grid, the result data can be reduced as shown in
When the resolution is decided by the learning setting interface, a corresponding variable is divided into equal parts corresponding to the resolution from an average to −5σ to +5σ thereof. At this time, the reason that the minimum value to the maximum value of the variable is not divided by the resolution but −5σ to +5σ of the variable is divided by the resolution is that abnormally large or small values may be occasionally included in learning data, and therefore, if the minimum value and the maximum value of the variable are used, the grids may be abnormally distributed. Variables are naturally distributed, and therefore, most data are distributed between −5σ to +5σ of the variable. For example, when the resolution is set to 4, the variable may be divided into four grids, i.e. a grid of −5σ to −2.5 σ, a grid of −2.5 σ to an average, a grid of the average to +2.5 σ, and a grid of +2.5 σ to +5σ. On the other hand, when the resolution is set to 2, the variable may be divided into two grids, i.e. a grid of −5σ to an average and a grid of the average to +5σ.
Next, a predetermined rate or a certain rate input by a user is used to reduce the number of data included in each grid. The number of data in each of the grids is reduced according to such a rate. Although data are reduced according to this rate, at least one of the data must be left.
The data compression method may be variously used in the statistical learning method. In order to achieve the greatest effect, variables must be grouped first, and then data compression must be performed in the same group. This is because if the data compression method is applied to a signal upon which signal processing is not performed, compression effects may be reduced.
As is apparent from the above description, the present invention has the effect of collecting learning data from a database of a computer in a power plant and converting the data into a form in which the data can be easily learned in realizing a monitoring system for analyzing process margin of industrial equipment based on a statistical learning method.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2010-0035755 | Apr 2010 | KR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2011/002758 | 4/18/2011 | WO | 00 | 6/25/2012 |