Embodiments of the invention will be described hereinafter with reference to the accompanying drawings.
As shown in
The prediction/diagnosis unit 3 includes a time series data memory unit 31, primary model creation unit 32, model series memory unit 33, prediction error calculation unit 34, model series candidate creation unit 35, model series candidate memory unit 36, optimal model series selection unit 37, predictor calculation unit 38 and prediction result diagnostic unit 39. The respective units have the following functions. The time series data memory unit 31, model series memory unit 33 and model series candidate memory unit 36 may be respectively configured by different memory devices or configured by a single memory device.
The time series data memory unit 31 stores time series data items sequentially input from the input unit 1.
The primary model creation unit 32 creates a linear model used to predict generation of data items based on a preset number of time series data items.
The model series memory unit 33 stores a model series created by the primary model creation unit 32 or a model series selected by the optimal model series selection unit 37 which will be described later.
The prediction error calculation unit 34 compares a value calculated based on the model series stored in the model series memory unit 33 with a value stored in the time series data memory unit 31 and calculates an error therebetween.
The model series candidate creation unit 35 creates candidates of a plurality of linear model series used to predict time series data stored in the time series data memory unit 31.
The model series candidate memory unit 36 stores a plurality of model series candidates created by the model series candidate creation unit 35.
The optimal model series selection unit 37 selects an optimum model series among the model series stored in the model series candidate memory unit 36 and updates and records the selected model series into the model series memory unit 33.
The predictor calculation unit 38 calculates time at which the output value exceeds a limit and outputs the time to the output unit 2.
The prediction result diagnostic unit 39 estimates the reason for the prediction result by calculation and outputs the reason to the output unit 2. In the embodiment, it is supposed that unit variate time series data is used as data to be input.
The operation of the time series data prediction/diagnosis apparatus with the above configuration is explained with reference to
The primary model creation unit 32 initialize time series data stored in the time series data memory unit 31 and model series stored in the model series memory unit 33 (S10). When unit variate time series data is input via the input unit 1, the time series data memory unit 31 additionally stores the unit variate time series data in an input order (S11).
Next, the primary model creation unit 32 determines whether or not a primary model can be created based on the number of data items stored in the time series data memory unit 31. At this time, if it is determined that a sufficiently large number of data items which permit a primary model to be created are stored in the time series data memory unit 31, a linear model suitable for the time series data stored in the time series data memory unit 31 is created (S12). In this case, the primary model creation unit 32 calculates coefficients α, β (linear model coefficients) which minimizes an error obtained when a preset number of time series data items are applied in the equation (1) and stores a model application time range and linear model coefficients. In the primary model, the application time range t is set larger than 0.
Y=αX+β (1)
The linear model is stored in the model series memory unit 33.
When new unit variate time series data is input via the input unit 1 (S13), the new unit variate time series data is additionally stored in the time series data memory unit 31 like the case of the step S11.
The prediction error calculation unit 34 calculates an error between the new unit variate time series data and a prediction value estimated from the model series stored in the model series memory unit 33 (S14). Specifically, the prediction error calculation unit 34 reads out unit variate time series data at time t from the time series data memory unit 31 and reads out a model coefficient corresponding to the time t from the model series memory unit 33. Then, it calculates an error between the value calculated according to the equation (1) and the unit variate time series data read out from the time series data memory unit 31. At this time, if the error is smaller than a preset error, the process returns to the step S13 and if the error is larger than the preset error, the process proceeds to the step S15. In this case, the magnitude of the error may be determined by, for example, calculating errors for data items which are considered to be suited to a linear model based on the linear model and setting the largest error among the calculated errors as a reference error and the process may proceed to the step S15 when the error becomes larger than the reference error.
If it is determined that the error is larger than the preset error as the result of calculation by the prediction error calculation unit 34, the model series candidate creation unit 35 creates a plurality of new model series (S15). The model series are stored in the model series candidate memory unit 36. The operation of the model series candidate creation unit 35 is explained in detail with reference to
The model series candidate creation unit 35 reads out time series data stored in the time series data memory unit 31 and determines window width derived based on the number of data items required at the time of creation of the primary model. For example, the window width is given as “primary model creation time” from t0 to t1 in
The optimal model series selection unit 37 reads out a model application time range and linear model coefficient for each candidate from the model series candidate memory unit 36 and derives a candidate which minimizes a value obtained by the following equation (2).
Then, one optimal model series is selected from a plurality of model series stored in the model series candidate memory unit 36 (S16). The equation (2) indicates an MDL information reference obtained when an error average ε with respect to models is “0” and follows the normal distribution of dispersion σ, N indicates the number of data items, σ indicates dispersion, and εi indicates an error or difference between an actual value and a value obtained by reading out the linear model coefficient stored in the model series memory unit 33 and derived by calculation according to the equation (1). In the example shown in
In the step S16, the predictor calculation unit 38 reads out the linear model coefficient at the time of t>t3 stored in the model series memory unit 33 when the number of constituents of the model series in the model series memory unit 33 increases or when a value of the time series data at the current time t3 exceeds a warning level value. Then, the predictor calculation unit 38 calculates time at which data exceeds a danger (fault) level value (>warning level value) (S17). More specifically, the predictor calculation unit 38 carries out calculation based on the equation (1) or (2) to derive time at which the danger (fault) level value is reached and outputs the thus derived time.
The prediction result diagnostic unit 39 reads out a model series stored in the model series memory unit 33 and a time series data set stored in the time series data memory unit 31 and additionally outputs the reason why the prediction is attained (S18). Specifically, the prediction result diagnostic unit 39 reads out final variation time (time t1 in
A time series data prediction/diagnosis apparatus according to a second embodiment is explained with reference to the accompanying drawings. The time series data prediction/diagnosis apparatus according to the embodiment includes an input unit 1, output unit 2 and prediction/diagnosis unit 4. In
The time series data prediction/diagnosis apparatus according to the second embodiment with the above configuration is explained with reference to
First, time series data stored in the time series data memory unit 31, model series stored in the model series memory unit 33 and unit space memory unit 42 are initialized (S200). When multivariate time series data X is input via the input unit 1, the time series data memory unit 31 additionally stores the multivariate time series data items X in an input order (S201).
The unit space calculation unit 41 reads out the number of data items stored in the time series data memory unit 31 and determines whether it is possible to create a unit space or not. It is preferable that the number of data items used to create the unit space will be three times the number of items (the number of variates) or more. If it is possible to create the unit space, it reads out all of the multivariate time series data items X stored in the time series data memory unit 31 to calculate unit space information (S202). Specifically, the unit space calculation unit 41 derives an average of variates of the input multivariate time series data items X and standard deviation and calculates a correlation coefficient matrix of the variates and an inverse matrix of the correlation coefficient matrix. Then, the average of the variates which are unit space information, standard deviation, correlation coefficient matrix and inverse matrix of the correlation coefficient matrix are stored into the unit space memory unit 42.
The output value calculation unit 43 reads out multivariate time series data items X at respective times stored in the time series data memory unit 31 and unit space information stored in the unit space memory unit 42 and calculates an output value Y (S203). In this case, the output value Y is data corresponding to the data of the time series data prediction/diagnosis apparatus according to the first embodiment. Like the first embodiment, the primary model creation unit 32 creates a primary model and the thus created primary model is stored in the model series memory unit 33 as in the first embodiment.
After this, if new time series data X′ having a plurality of items is input, the new time series data X′ is additionally stored in the time series data memory unit 31 as in the step S201 (S204).
The output value calculation unit 43 reads out the time series data X′ finally stored in the time series data memory unit 31 and unit space information stored in the unit space memory unit 42 and calculates an output value Y (S205). Specifically, the output value calculation unit 43 reads out the average of variates and standard deviation from the unit space memory unit 42 with respect to the input multivariate time series data X and normalizes the multivariate time series data X by use of the above values. The output value calculation unit 43 further reads out the inverse matrix of the correlation coefficient matrix from the unit space memory unit 42 and calculates an output value by use of the inverse matrix and normalized multivariate time series data X. In this case, a function indicated by the following equation (3) is used as the calculation function for the output value.
The above equation (3) is one example of the calculation function for the output value and called a Mahalanobis distance in the Taguchi method.
In the equation (3), X(t) is normalized input data at time t and is given by the following equation.
where X(t)T denotes a transposed matrix of X(t). Further, in the above equation, σi and mi indicate a standard deviation and average of variates i in respective unit spaces. Further, xi(t) indicates an observed value of the variate i at time t or a value obtained by subjecting the observed value to a primary process.
The operation of the steps S206 to S209 is the same as that of the steps S14 to S17 of
The prediction result diagnostic unit 39 reads out model series stored in the model series memory unit 33 and a time series data set stored in the time series data memory unit 31 and additionally outputs the reason why the prediction is reached (S210). The detail prediction method is explained below.
A case wherein multivariate time series data X of {{x1(1), x2(1), . . . , xk(1)}, . . . , {x1(τ), x2(τ), . . . , xk(τ)}, . . . , {x1(T), x2(T), . . . , xk(T)}} is input is considered. In this example, the model series is configured by two models, τ indicates variation time of the model series stored in the model series memory unit 33 and T indicates current time.
The prediction result diagnostic unit 39 calculates factors which largely contribute to coincidence of models at time t=1, . . . , τ and derives characteristic values of factors which deviate from the models at respective times of time t=τ, . . . , T by calculation. A set of factors whose characteristic values become larger than the threshold value in both of the above intervals is used as the result of diagnosis for prediction. At this time, transition of the results of diagnosis at time t=τ, . . . , T can be output by extracting a factor variation at each time t=τ, . . . , T.
The flow of the process of the prediction result diagnostic unit 39 is explained in detail with reference to
Time t is initialized to “1” and the average Gbi (i=1, . . . , k) of gains is initialized to “0” (S300).
Then, whether or not time t is before the time τ of a breakpoint of the model is determined (S301). If the time t is before the time τ (“Yes” in S301), the prediction result diagnostic unit 39 first reads out multivariate time series data X(t) from the time series data memory unit 31 (S302). The multivariate time series data X(t) is assigned to a two-level orthogonal table Ln in which the first level: “variate i is used” and the second level: “variate i is not used” are set (S303). In this case, Ln is a two-level orthogonal table having the minimum size n which causes the number of variates to become equal to or larger than k.
A gain difference Gdi(t) (i=1, . . . , k) of each variate associated with a small expectation characteristic of the two-level orthogonal table Ln created in the step S303 is derived by use of the equations (4) and (5) (S304). D(d, t)2 is an output value (Mahalanobis distance) obtained when an experiment is performed by using only variates of the first level of the experiment No. d (d=1, . . . , n) at time t.
In order to lower the calculation cost, it is desirable to calculate the inverse matrix of the correlation matrix in each experiment in the step S202 of
n
d(t)=−10×log D(d,t)2 (5)
The average gain difference Gdbi (i=1, . . . , k) of each variate is updated by use of the gain difference Gdi(t) (i=1, . . . , k) of each variate (S305).
Then, the time t is incremented and the process returns to the step S301 (S306). If the time t becomes larger than time τ, the process proceeds to the step S307 (“No” in S301).
In the step S307, if the time t is before the time T, the process proceeds to the step S308 (“Yes” in S307).
The multivariate time series data X(t) is read out from the time series data memory unit 31 by the same procedure as in the step S302 (S308).
The multivariate time series data X(t) is assigned to the two-level orthogonal table Ln by the same procedure as in the step S303 (S309).
A gain difference Gdi (i=1, . . . , k) of each variate associated with a large expectation characteristic of the two-level orthogonal table Ln created in the step S309 is derived by use of the equations (4) and (6) (S310).
The average gain difference Gdbi of the variate derived in the step S305 and the gain difference Gdi of each variate associated with a large expectation characteristic derived in the step S310 are evaluated by use of the following equation (7) and a variate index i larger than the threshold value and time t are temporarily stored (S311).
Then, the time t is incremented and the process returns to the step S307 (S312) and if the time t becomes larger than the time T, the process proceeds to the step S313 (“No” in S307). After this, the variate index i and time t temporarily stored in the step S311 are read out and sorted according to time t and transition of the gain difference is displayed as graphs as shown in
As shown in
At this time, it is supposed that a diagram shown in
According to the above embodiments, the model can be fit for a simple and highly precise model while a change in information is coped with. This is because a single model or a plurality of divided models are used as an optimal model by using the efficient method for dividing windows based on the length of a unit space while the penalty represented by a plurality of models and a difference between information and a model is set as a reference.
Further, a warning can be issued on the real-time basis when the model varies. This is because it is supposed that a change in the number of models indicates a rapid variation in information and a warning is issued in a case where the number of models varies when the models are sequentially changed.
Also, the detail diagnosis for a variation in the model can be performed. This is attained by analyzing a factor fit for the model before division and a factor which deviates from the model after division in the intervals before and after the dividing point of the models and using a factor which causes the values of the above two factors to become large as the diagnosis for the result of prediction.
According to the embodiments, model selection with high precision can be carried out while a variation in the information source is coped with, a warning can be issued on the real-time basis when the model varies or a cause-and-effect relation between items which cause a variation in the model can be extracted and presented.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2006-173907 | Jun 2006 | JP | national |