The present disclosure relates to a technology for predicting time series data.
In an information system including various apparatuses such as a server and a storage device, various time series data referred to as metric data is measured, and to appropriately predict future values of the metric data is effective in management work such as capacity planning.
In predicting the metric data, it is important to take into account three requirements as follows.
A first requirement is that metric data often has a large fluctuation at non-equidistant timing such as a specific day of week near the end of the month. In the present specification, such a large fluctuation occurring on the metric data at the non-equidistant timing is referred to as a “non-equidistant event.”
A second requirement is that a value not measured in the past often occurs in the future. In general, to predict the value not measured in the past is referred to as “extrapolation.”
A third requirement is that a relative behavior of the metric data continues before and after specific timing such as timing of a change of the month, but an absolute value range often has a large shift. In the present specification, such a shift is referred to as a “base line shift.”
Furthermore, as a method of predicting metric data, there is known a method of analyzing past metric data (for example, metric data in last few months) and predicting future metric data (for example, metric data in next few months). Since the metric data is normally large in number, it is difficult to manually perform such a method. For this reason, an approach such as machine learning is used for prediction of metric data.
As a machine learning approach for predicting metric data in consideration of “non-equidistant events,” tree-based (for example, decision tree-based and regression tree-based) approaches capable of handling a behavior in response to a date, a day of week, and the like are prominent. The tree-based approaches include RANDOM FORESTS® and the like.
However, with the tree-based approaches, it is difficult to cope with “extrapolation” and “base line shift.”
To address such a problem, JP-2017-123088-A discloses a technology for selecting feature variables using data in a last long period and then narrowing down the feature variables using most recent data, using a decision tree learning algorithm of a tree-based approach, in order to cope with a most recent change of a tendency in training data.
With the technology described in JP-2017-123088-A, the training data used to create (learn) a final prediction model is considered to be only most recent data and, therefore, may be capable of coping with the “base line shift” occurring immediately afterward. However, with the technology described in JP-2017-123088-A, it is not supposed to learn and predict a long-term behavior such as the “non-equidistant event” that may occur monthly. In addition, the technology described in JP-2017-123088-A does not take “extrapolation” into account.
An object of the present invention is to provide a time series data prediction apparatus and a time series data prediction method capable of predicting time series data while coping with a “non-equidistant event,” “extrapolation,” and a “base line shift.”
A time series data prediction device according to one aspect of the present disclosure includes: an event calculation section that calculates event prediction data that is predicted values of time series data in an intended period including a past certain period on a basis of actual measured value data indicating values of the time series data in the past certain period; and a correction section that calculates, as prediction result data that is a prediction result of the time series data, data obtained by shifting each value of the event prediction data in response to a difference between the actual measured value data and the event prediction data in a same period.
According to the present invention, it is possible to predict time series data while coping with a “non-equidistant event,” “extrapolation,” and a “base line shift.”
Embodiments of the present disclosure will be described hereinafter with reference to the drawings.
The management server 101 is a time series data prediction apparatus that predicts a future behavior of metric data that is time series data acquired from instruments to be managed installed in the data center 102. While a type of the metric data is not limited to a specific type, examples of the type of the metric data include sensing data associated with resources of the information system. The management server 101 includes an interface 111, a data acquisition section 112, a future prediction section 113, and an analysis result display section 114. Configurations of the management server 101 will be described later with reference to
The data center 102 is a location where the instruments to be managed are installed. In the present embodiment, the instruments to be managed include a server instrument 121, a network instrument 122, and a storage instrument 123. The instruments to be managed may be a plurality of server instruments 121, a plurality of network instruments 122, and a plurality of storage instruments 123. Furthermore, these instruments are an example of the instruments to be managed and the instruments to be managed are not limited to these instruments.
The network 103 is a communication network that makes the management server 101, the data center 102, the metric DB 104, the console 105, and the like communicable with one another.
The metric DB 104 is a storage device that stores and manages metric data associated with the instruments to be managed.
The console 105 is an instrument used by an administrator of the information system and is an input/output device that receives various information from the administrator and that notifies the administrator of various information by displaying the information, for example.
While the management server 101, the metric DB 104, and the console 105 are disposed outside of the data center 102 in
In
The auxiliary storage device 203 is an apparatus that records data in a data writable and readable fashion, and stores a program for specifying operations of the CPU 201 and the like. The communication interface 204 communicates with an external apparatus such as the console 105 via the network 103. The media interface 205 writes and reads data to and from an external recording medium 207. The input/output device 206 is connected to the console 105 operating the management server 101.
The CPU 201 reads the program for specifying the operations of the management server 101 from the auxiliary storage device 203 to the memory 202, and executes the program using the memory 202, thereby realizing the interface 111, the data acquisition section 112, the future prediction section 113, and the analysis result display section 114 depicted in
It is noted that part of or all of configurations, functions, and the like of the management server 101 may be realized by hardware by, for example, being designed with an integrated circuit.
The metric data has a first feature that a “non-equidistant event” having a large fluctuation at non-equidistant timing such as a specific day of week near the end of a month often occurs. In addition, the metric data has a second feature that “extrapolation” of appearance of a value not measured in the past often occurs. It is noted that
First, as depicted in
Subsequently, as depicted in
Furthermore, as depicted in
As depicted in
Furthermore, as depicted in
In a case of occurrence of a base line shift, as depicted in
The interface 111 receives a processing request to request execution of prediction processing for predicting future values of the metric data from outside (for example, the console 105). The processing request contains, as arguments, a metric class that is a class of metric data to be predicted, and a future prediction period D02 that is a future period in which values of the metric data are to be predicted. The interface 111 outputs the metric class contained in the processing request to the data acquisition section 112, and outputs the future prediction period D02 contained in the processing request to the future prediction section 113.
The data acquisition section 112 acquires past metric data corresponding to the metric class output from the interface 111, from the metric DB 104 of
The future prediction section 113 receives the future prediction period D02 from the interface 111 and the metric data D01 from the data acquisition section 112. The future prediction section 113 generates future prediction data D12 that is a prediction result of the metric data in the future prediction period D02 on the basis of the metric data D01, and outputs the future prediction data D12 to the analysis result display section 114.
The analysis result display section 114 displays a graphical user interface (GUI) that display information containing the future prediction data D12 from the future prediction section 113.
The future prediction section 113 will be described in more detail hereinafter. Specifically, the future prediction section 113 includes a data division section 401, a trend/event prediction section 402, and a correction section 403.
The data division section 401 extracts training data D03 and validation data D04 from the metric data D01 from the data acquisition section 112 as actual measured value data indicating values of the metric data D01 in a past certain period, and outputs the training data D03 and the validation data D04. The training data D03 is data in a period, for example, from a first month to a (T−1)-th month among the metric data D01. The validation data D04 is data in a period of, for example, a T-th month among the metric data D01. In this case, a period from a beginning of the first month to an end of the T-th month corresponds to the past certain period.
The trend/event prediction section 402 receives the future prediction period D02 from the interface 111 and the training data D03 and the validation data D04 from the data division section 401. The trend/event prediction section 402 calculates and outputs trend prediction data D06 indicating a tendency of time series data in an intended period and event prediction data D09 that is predicted values of the time series data in the intended period on the basis of the future prediction period D02, the training data D03, and the validation data D04. The intended period is a period including the certain period (the period of the training data D03 and the period of the validation data D04), more specifically, a period obtained by adding the future prediction period D02 to the certain period.
Specifically, the trend/event prediction section 402 includes an elapsed time extraction section 411, a trend regression section 412, a trend removal section 413, a temporal feature amount extraction section 414, and an event regression section 415.
The elapsed time extraction section 411 generates and outputs elapsed time information D05 indicating elapsed time on the basis of the future prediction period D02, the training data D03, and the validation data D04. The elapsed time information D05 may be information indicating elapsed time since a first point in time of the intended period (a first period of the period of the training data D03) until a last point in time of the intended period (last point in time of the future prediction period D02).
The trend regression section 412 is a trend calculation section that predicts (calculates) trend data indicating the tendency (trend) of the metric data in the intended period on the basis of the future prediction period D02, the training data D03, the validation data D04, and the elapsed time information D05, and outputs the predicted (calculated) trend data as the trend prediction data D06.
The trend removal section 413 generates and outputs residual data D07 that is relative value data indicating relative values of the training data D03 and the validation data D04 to the trend prediction data D06.
The temporal feature amount extraction section 414 generates and outputs a temporal feature amount D08 on the basis of the future prediction period D02, the training data D03, and the validation data D04. The temporal feature amount D08 is values associated with time since the first point in time of the intended period until the last point in time of the intended period, and is, for example, calendar information including months, days of week, and dates. It is noted that the calendar information may contain information such as non-work days and business days.
The event regression section 415 is an event calculation section that predicts (calculates) event data that is predicted values of the metric data in the intended period on the basis of the residual data D07 and the temporal feature amount D08, and that outputs the predicted (calculated) event data as the event prediction data D09. Since the residual data D07 is used in the present embodiment, the event prediction data D09 indicates values exclusive of a trend component of the metric data (an influence of the trend prediction data D06).
The correction section 403 calculates and outputs, as prediction result data about the metric data, the future prediction data D12 obtained by shifting each value of the event prediction data D09 from the trend/event prediction section 402 in response to a reference difference that is a difference between the actual measured value data (training data D03 and validation data D04) and the event prediction data D09. The reference difference is the difference between the actual measured value data and the event prediction data D09 in the same period.
In the present embodiment, the correction section 403 receives the validation data D04 from the data division section 401, and the trend prediction data D06 and the event prediction data D09 from the trend/event prediction section 402. Furthermore, the correction section 403 includes a trend reconstruction section 421, a base line shift determination section 422, and a prediction correction section 423.
The trend reconstruction section 421 generates and outputs, as composite prediction data D10, data obtained by shifting each value of the event prediction data D09 using the trend prediction data D06 as a reference difference. Specifically, the trend reconstruction section 421 adds up the trend prediction data D06 and the event prediction data D09, and generates an addition result as the composite prediction data D10.
The base line shift determination section 422 determines whether a base line shift occurs using the validation data D04 and the composite prediction data D10. Specifically, the base line shift determination section 422 obtains a difference between the validation data D04 and the composite prediction data D10 in a terminal period including an end of the period of the validation data D04 as the reference difference, and determines whether the reference difference is equal to or greater than a certain value. The base line shift determination section 422 determines occurrence of a base line shift in a case in which the reference difference is equal to or greater than the certain value, and determines non-occurrence of a base line shift in a case in which the reference difference is smaller than the certain value.
The base line shift determination section 422 outputs the composite prediction data D10, and further outputs a correction amount D11 in response to the reference difference that is the difference between the validation data D04 and the composite prediction data D10 in the terminal period in the case of occurrence of a base line shift. The correction amount D11 may be, for example, the same value as the reference difference or may be a value obtained by performing predetermined computing on the reference difference.
The prediction correction section 423 calculates and outputs the future prediction data D12 in response to the composite prediction data D10 and the correction amount D11. For example, the prediction correction section 423 outputs, as the future prediction data D12, data obtained by shifting each value of the composite prediction data D10 by the correction amount D11 in the case of the occurrence of a base line shift, and outputs, as the future prediction data D12, the composite prediction data D10 as it is in the case of non-occurrence of a base line shift.
In Step S601, the data division section 401 extracts the training data D03 and the validation data D04 from the metric data D01, and outputs the training data D03 and the validation data D04. The training data D03 is data in the period, for example, from the first month to the (T−1)-th month among the metric data D01. The validation data D04 is data in the period of, for example, the T-th month among the metric data D01.
In Step S602, the elapsed time extraction section 411 generates and outputs the elapsed time information D05 on the basis of the future prediction period D02, the training data D03, and the validation data D04. It is assumed herein that the future prediction period D02 indicates a (T+1)-th month.
In Step S603, the trend regression section 412 generates a trend prediction model with the elapsed time information DOS assumed as explanatory variables and values of the trend data as objective variables on the basis of the training data D03 and the validation data D04. The trend regression section 412 calculates and outputs the trend prediction data D06 indicating the trend of the metric data in the intended period that is a period by adding up the period of the training data D03, the period of the validation data D04, and the future prediction period D02, using the trend prediction model. The trend prediction model is, for example, a linear regression model.
In Step S604, the trend removal section 413 generates and outputs the residual data D07 by subtracting the trend prediction data D06 from the actual measured value data (the training data D03 and the validation data D04). Specifically, the trend removal section 413 generates the residual data D07 by subtracting a value of the trend prediction data D06 from a value of the actual measured value data for every identical clock time.
In Step S605, the temporal feature amount extraction section 414 generates and outputs the temporal feature amount D08 on the basis of the future prediction period D02, the training data D03, and the validation data D04.
In Step S606, the event regression section 415 generates an event prediction model with the temporal feature amount D08 assumed as explanatory variables and values of the metric data as objective variables on the basis of the residual data D07. The event regression section 415 predicts metric data in the intended period using the event prediction model, and outputs the predicted metric data as the event prediction data D09.
Examples of the event prediction model include a decision tree model. In the present embodiment, the event prediction model is assumed as RANDOM FORESTS® model that is a kind of the decision tree model. In this case, the event prediction model includes a plurality of different decision trees each calculating candidate data that serves as a candidate of the event prediction data D09. The event regression section 415 calculates, as values at clock times in the event prediction data D09, a representative value, a lower limit, and an upper limit on the basis of a plurality of pieces of candidate data calculated by the decision trees. The representative value is an average value, a median value, or the like of values of each piece of candidate data. The lower limit is a 5 percentile value or the like of the values of each piece of candidate data. The upper limit is a 95 percentile value or the like of the values of each piece of candidate data.
In Step S607, the trend reconstruction section 421 adds up the trend prediction data D06 and the event prediction data D09, and outputs the addition result as the composite prediction data D10.
In Step S608, the base line shift determination section 422 determines whether a base line shift occurs by comparing a value in the terminal period of the validation data D04 with a value of the composite prediction data D10 in the same period as the terminal period.
Specifically, the base line shift determination section 422 obtains an absolute value of a difference between the value in the terminal period of the validation data D04 and the value of the composite prediction data D10 in the same period as the terminal period as the reference difference, and determines whether the reference difference is equal to or greater than the certain value. In this case, the base line shift determination section 422 determines occurrence of a base line shift in the case in which the reference difference is equal to or greater than the certain value, and determines non-occurrence of a base line shift in the case in which the reference difference is smaller than the certain value. It is noted that the terminal period of the validation data D04 may, for example, be only a terminal clock time of the validation data D04 or include a plurality of clock times including the terminal clock time. In a case in which the terminal period includes the plurality of clock times, the reference difference is an average value or the like of differences among values of the clock times included in the terminal period.
The processing of Step S609 is executed in the case of occurrence of a base line shift, and the processing of Step S610 is executed in the case of non-occurrence of a base line shift.
In Step S609, the base line shift determination section 422 calculates the correction amount D11 in response to the reference difference, and outputs the composite prediction data D10 and the correction amount D11. The prediction correction section 423 generates, as the future prediction data D12, data obtained by shifting each value of the composite prediction data D10 from the base line shift determination section 422 by the correction amount D11, outputs the future prediction data D12 to the analysis result display section 114, and ends the processing.
On the other hand, in Step S610, the base line shift determination section 422 outputs the composite prediction data D10. The prediction correction section 423 outputs, as the future prediction data D12, the composite prediction data D10 from the base line shift determination section 422 to the analysis result display section 114, and ends the processing.
The operations described so far are given as an example only and the operations are not limited to these operations. For example, as a modified embodiment, the future prediction section 113 may repeatedly calculate the composite prediction data D10 while shifting the periods of the training data D03 and the validation data D04.
For example, in Step S601, the data division section 401 assumes data in a period from the first month to an (S−1)-th month as the training data D03 among the metric data D01, and assumes an S-th month among the metric data D01 as the validation data D04. It is assumed that the future prediction period D02 is a (T+1)-th month and that S is a value equal to or smaller than T.
Furthermore, when Step S608 is over, the base line shift determination section 422 determines whether the period of the validation data D04 (S-th month) reaches a final period. The final period is specified in advance and is, for example, one month before the future prediction period D02 (T-th month).
In a case in which the period of the validation data D04 (S-th month) does not reach the final period, then the periods of the training data D03 and the validation data D04 are shifted, and the processing returns to Step S601. For example, S is incremented by 1 and the processing returns to Step S601. At this time, in a case in which occurrence of a base line shift is determined in Step S608, the trend regression section 412 generates a trend prediction model by a different approach such as a change of a period of the elapsed time used for prediction of the trend data.
The processing goes to Step S609 in a case in which the period of the validation data D04 reaches the final period and occurrence of a base line shift is determined in immediately preceding Step S608, and the processing goes to Step S610 in a case in which the period of the validation data D04 reaches the final period and non-occurrence of a base line shift is determined in immediately preceding Step S608.
It is noted that a flow of information added in a case of applying the modified embodiment described above is denoted by dotted lines in
A GUI 700 depicted in
In the embodiments described so far, a combination of the linear regression model that is the trend prediction model and a nonlinear regression model (specifically, RANDOM FORESTS® model) that is the event prediction model is used. As the nonlinear regression model, a neural network may be used. For example, the event regression section 415 applies the training data D03 to each of N neural networks, calculates N pieces of event prediction data, and extracts K neural networks higher in prediction accuracy for the validation data D04 from those N pieces of event prediction data. The event regression section 415 calculates the event prediction data with values of each of the K neural networks assumed as explanatory variables and values of the validation data D04 assumed as objective variables in the period of the validation data D04.
As described so far, according to the present embodiments, the event regression section 415 calculates the event prediction data D09 that is predicted values of the metric data in the intended period including the past certain period on the basis of the actual measured value data (training data D03 and validation data D04) indicating values of the metric data in the past certain period. The correction section 403 calculates, as the prediction result data that is the prediction result of the time series data, the data obtained by shifting each value of the event prediction data D09 in response to the difference between the actual measured value data and the event prediction data D09 in the same period. Owing to this, each value of the event prediction data D09 is shifted in response to the difference between the actual measured value data and the event prediction data D09; thus, it is possible to appropriately predict the metric data even in a case of occurrence of the “extrapolation,” and a “base line shift.”
Furthermore, in the present embodiments, the trend regression section 412 calculates the trend prediction data D06 indicating the tendency of the metric data in the intended period on the basis of the actual measured value data. The trend removal section 413 calculates the residual data D07 that is the relative value data indicating relative values of the actual measured value data to the trend prediction data D06. The event regression section 415 calculates the event prediction data D09 on the basis of the residual data D07. The correction section 403 shifts each value of the event prediction data D09 using the trend prediction data D06 as the reference difference. Owing to this, it is possible to predict the metric data more appropriately in the case of occurrence of the “extrapolation.”
Moreover, in the present embodiments, the trend regression section 412 generates the trend prediction model with the elapsed time assumed as explanatory variables and values of the metric data assumed as objective variables on the basis of the actual measured value data, and calculates the trend prediction data D06 using the trend prediction model. The trend prediction model is, in particular, a linear regression model. In this case, it is possible to predict the tendency of the metric data in the intended period more appropriately.
Furthermore, in the present embodiments, the event regression section 415 generates the event prediction model with the calendar information assumed as explanatory variables and the values of the metric data assumed as objective variables on the basis of the residual data D07, and calculates the event prediction data D09 using the event prediction model. The event prediction model includes, in particular, a decision tree model. In this case, it is possible to predict an event component of the metric data more appropriately.
Moreover, in the present embodiments, the decision tree model includes a plurality of decision trees each calculating candidate data that serves as a candidate of the event prediction data, and the lower limit, the representative value, and the upper limit are calculated as values of the event prediction data D09 on the basis of a plurality of pieces of candidate data. In this case, it is possible to indicate a prediction range of the metric data.
Furthermore, in the present embodiments, the correction section 403 shifts each value of the event prediction data D09 in response to the difference between the validation data D04 and the composite prediction data D10 in the terminal period of the validation data D04. In this case, it is possible to predict the metric data more appropriately in the case of occurrence of the “base line shift.”
The event regression section 415 repeatedly calculates the event prediction data D09 while shifting the certain period. The trend regression section 412 changes a trend prediction data calculation method in a case in which it is determined that the reference difference is equal to or greater than the certain value. In this case, it is possible to predict the tendency of the metric data in the intended period more appropriately.
The embodiments of the present disclosure described above are exemplarily given for describing the present disclosure and not intended to limit the scope of the present disclosure only to the embodiments. A person skilled in the art can carry out the present disclosure in various other manners without departure from the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
2020-029523 | Feb 2020 | JP | national |