The present subject matter relates to electronic-computing systems and in particular relates to data forecasts in predictive-analytics environment.
Machine-learning (ML) models have been developed as predictive analysis criteria for drawing predictions such as sales-forecast. Such models usually receive input data set of time series index data comprising independent, predictable variables (e.g. historical sales) as input to forecast the sales of target product, wherein the sales of target product acts as a predicted-variable. Likewise, the state of the art examples may be construed to cover other indicia such as production, manufacturing, inflation, price etc.
At least a constraint associated with existing predictive analytics is requirement of the set of input index data in uniform timescale. For example, all input data are recorded in monthly base to forecast monthly sales of target product. This may be quite a restrictive requirement especially since multiple index data may be recorded in heterogonous time scales. For example, GDP index is commonly recorded in quarterly, PMI in monthly, while IHS Market car data in mixed of monthly, quarterly, and yearly.
The state of the art predictive-analytics do not substantially utilize all the valuable data in one model for the forecast and accordingly and employ different models based on different timescales. In an example of state of the art predictive analytics as depicted in
As a part of another example state of the art predictive analysis, a time interval of the time-series data is selected from a group consisting of the time intervals of the data sets. The method further includes “down-sampling” the observations of the first data set, and converting the time interval of the first data set to the time interval of the time-series data. Overall, this disclosure refers determining time interval of input data, perform down-sampling, feature engineering, and forecasting results.
However, said state of art techniques fail to perform optimized forecast in respect of heterogeneous time-scale inputs, i.e. input from different timescales or lower timescale data forecast. Such example heterogeneous time-scale inputs (monthly, yearly, mixed) have been depicted in
Even if example disclosure related to the down sampling and feature engineering is concerned, the same at-least fails to refer any time series or time domain based up-scaling of data-transform.
Overall, the state of the art predictive analytics and forecast models do not perform sales forecast by accepting heterogeneous timescale (time granularity) data as input and accordingly fall substantially short of maximizing the use of valuable information to obtain improved sales.
This summary is provided to introduce a selection of concepts in a simplified format that is further described in the detailed description of the present disclosure. This summary is neither intended to identify key inventive concepts of the disclosure nor is it intended for determining the scope of the invention or disclosure.
In an embodiment, the present subject matter refers a method for forecasting demand with respect to an entity. The method comprises receiving a plurality of input data-sets associated with time-series data, wherein each of said data-sets refers a time-based variation of one or more variables in accordance with a designated time-interval. At least one transformation-result is generated by transforming time-intervals of at least one input dataset based on a plurality of time interval transformation models. A plurality of first intermediate forecast results are predicted based on a plurality of demand forecasting models from the at-least one transformation result. An aggregated result is generated from the plurality of the first intermediate forecast results through an ensemble-model to thereby render said aggregated result as a final prediction result.
In another embodiment, the present subject matter refers a method for forecasting for time-series based dataset. The method comprises receiving a plurality of input data-sets associated with time-series data, wherein each of said data set refers a time-based variation of one or more variables in accordance with a designated time-scale. A time-scale of at-least one of said plurality of data sets is transformed based on at least one time-scale transformation model to generate at-least one transformed dataset. A plurality of intermediate prediction results are generated based on a plurality of demand forecasting models from at-least one transformed dataset, and at least one input data set other than the transformed data set. Further, an aggregated prediction-result is generated from the plurality of the intermediate prediction results based on an ensemble-learning model.
The objects and advantages of the embodiments will be realized and achieved at-least by the elements, features, and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are representative and explanatory and are not restrictive of the invention, as claimed.
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
The elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
For the purpose of promoting an understanding of the principles of the present disclosure, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will be understood that no limitation of the scope of the present disclosure is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the present disclosure as illustrated therein being contemplated as would normally occur to one skilled in the art to which the present disclosure relates.
The foregoing general description and the following detailed description are explanatory of the present disclosure and are not intended to be restrictive thereof.
Reference throughout this specification to “an aspect”, “another aspect” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or subsystems or elements or structures or components proceeded by “comprises . . . a” does not, without more constraints, preclude the existence of other devices or other subsystems or other elements or other structures or other components or additional devices or additional subsystems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this present disclosure belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Further, the method comprises generating (step 204) at-least one transformation-result by transforming time-intervals of at least one input dataset based on a plurality of time interval transformation models. The transformation of the time-intervals of the input data-set comprises unifying the time-intervals across the time intervals of the input dataset based on the plurality of time-interval transformation models. The transforming comprises transforming a time interval of the second input dataset similar to a time interval of the first input dataset based on the plurality of time interval transformation models. In an implementation, the transformation result is predicted as an ensemble-model result from the plurality of intermediate transformation results using one or more functions of error values of the plurality of training data points, the error values of the plurality of validation data points, and the second input dataset.
Further, the method comprises predicting (step 206) a plurality of first intermediate forecast results based on a plurality of demand forecasting models from the at-least one transformation result. In an implementation, the plurality of demand prediction models predicts the plurality of first intermediate forecast results based on at-least one of: the transformation result and a third-input dataset having the same or different time interval than the transformation result.
Further, the method comprises generating (step 208) an aggregated result from the plurality of the first intermediate forecast results through an ensemble-model to thereby render said aggregated result as a final prediction result. In an implementation, the prediction of the aggregated result to render the final prediction result based on the ensemble model comprises the steps of a) selecting a plurality of second intermediate forecast results from the plurality of first intermediate forecast results; and b) generating the aggregated result by combining the plurality of second intermediate prediction results. More specifically, the aggregated result is generated from the plurality of the first intermediate forecast results through an ensemble-model to thereby render said aggregated result as a final prediction result.
In an example, such selecting of the plurality of second intermediate prediction results comprises generating a first type of distribution for each time interval from the plurality of first intermediate forecast results. Optionally, a second type of distribution is also generated from the first distribution. Based on said first or second distribution, the second intermediate forecast results are selected from the plurality of first intermediate prediction results based on one or more of: a training error, a validation error, derivatives of said training and validation errors comprising an error variance, said errors and derivatives being associated with the plurality of first intermediate prediction results. In an example, first intermediate prediction results may also be based on an objective optimization function of said training error, said validation error, the derivations of the training error and the validation error, and combinations thereof associated with the plurality of first intermediate prediction results.
The method further comprises transforming (step 304) a time-scale of at-least one of said plurality of data sets based on at least one time-scale transformation model to generate at-least one transformed dataset. The at-least one transformed dataset is associated with a higher time scale out of the heterogeneous time-scales associated with the received input data-sets and accordingly defines a higher time granularity among the heterogeneous time-scales associated with the received input data-sets. The transforming of the time-scale of the at least one input data set through the transformation model comprises executing a first plurality of machine-learning and time series models over the at least one input data set to obtain a plurality of intermediate transformation data sets. In addition, an ensemble-learning model is executed for aggregating the plurality of intermediate transformation data sets set to obtain an aggregated output as said at-least one transformed data set.
The method further comprises predicting (step 306) a plurality of intermediate prediction results based on a plurality of demand forecasting models from at-least one transformed dataset; and at least one input data set other than the transformed data set. The predicting of plurality of intermediate prediction results from the transformed data set comprises executing a second plurality of machine-learning and time series models over the at least one transformed data set and the at least one input data set to obtain said intermediate prediction results.
The method further comprises generating an aggregated prediction-result (step 308) from the plurality of the intermediate prediction results based on an ensemble-learning model. The generation of an aggregated prediction-result based on the ensemble-learning model comprises selecting at least a subset said intermediate prediction results based on any function of training error, validation error, derivatives of said errors, combinations thereof, and a percentile setting associated with said second plurality of machine learning models and time series models. Thereafter, a high time scale ensembled forecast and a low time scale ensembled forecast are generated from the selected prediction result. Further, a final high time-scale forecast result is generated based on adjustment of the high time scale ensembled forecast by the low time scale ensembled forecast.
In an example, such generating of said final high time-scale forecast result comprises integrating the high time scale ensembled forecast into a corresponding low-time scale ensembled forecast. One or more weights are determined based on any function of one or more of a training error, a validation error, the derivatives, the combinations thereof associated with the second plurality of machine learning models and time series models. Thereafter, the high time scale ensembled forecast is adjusted based on one or more low time scale ensembled forecasts and said one or more weights. Accordingly, the final high time-scale forecast is generated as the adjusted high time scale ensembled forecast.
The input index data may have mixed time-scale interval, e. g, with monthly and quarterly records. For example, GDP index is commonly recorded in quarterly, PMI in monthly, while IHS Market car data in mixed of monthly, quarterly, and yearly. In the present example as depicted in
Further, as shown in
In the present example, the quarterly data (i.e. J points) corresponds to a low time scale and is proposed to be transformed to monthly data (JN points). Accordingly,
The aforesaid example errors such as training error, validation error, forecast error (TE, VE, FE), and optimized objective functions based on said errors such mean absolute error percentage (MAPE), mean absolute error (MAE), Model error variance (VAR), may be depicted as follows in following Table 1:
Examples of optimized functions based on errors may be Percentile of MAPE (TE, VE, FE) and or VAR (TE, VE, FE), any other possible combinations of errors, any other functions/derivations of errors or any possible combination of functions/derivations of errors. In order to enable a manual selection among aforesaid different types of errors or among different optimized functions, a GUI may be provided.
Thereafter and a part of operation of
An example ensemble learning example has been depicted in following Table 2
As may be understood from Table 2 and Step 2, the adjusted forecast_n as calculated refers the Final transformed high timescale index result as depicted in
As a part of ensemble step 1, multiple-forecasts for each time-series or time domain input are considered as first intermediate results as rendered from
In an example, Model-selection criteria examples may include
The following Table 3 represents example Model i error definitions:
Overall, as a part of ensemble step 1, a first type of distribution as box plots is generated for each time interval from the plurality of first intermediate forecast results. Optionally, a second type of distribution “histograms” may be generated from the first distribution. Based on said first or second distribution, the second intermediate forecast results are selected from the plurality of first intermediate prediction results based on a training error, a validation error, a forecast difference (if available), derivatives of said training and validation errors and forecast difference comprising an error variance, said errors and derivatives being associated with the plurality of first intermediate prediction results. In other example, the selection basis may be an objective optimization function of said training error, said validation error, the derivations of the training error and the validation error, and combinations thereof associated with the plurality of first intermediate prediction results. In yet another example, the selection basis may be a percentile setting associated with said second plurality of machine learning models and time series models.
As a part of ensemble Step 2, averaging is performed with respect to the each of the shortlisted time-domain forecasts in Ensemble step 1 to output one or more averaged time domain forecast that again may correspond to high time domain or low time domain. The averaging denotes computing a weighted-average based on a) validation error, b) sophisticated functions on training error, validation error, forecast difference (if applicable), or any combination, by the model ensemble based on weights for each selected model results at each data point. Overall, the generation of aggregated result comprises calculating a weighted-average of said shortlisted or the second intermediate forecast results to generate the final prediction result through ensemble step 3 as described later. The generated results as a part of present ensemble step 2 corresponds to generating a high time scale ensembled forecast and a low time scale ensembled forecast from the selected results.
An example Model ensemble example (sophisticated method) referring ensemble step 2 has been referred in below depicted Table 4.
Overall, the present ensemble steps 1 and 2 refer an ensemble of machine learning models to generate an empirical cumulative probability distribution of the forecast. Thereafter, an optimal range of percentile is chosen based on Table 3 and the forecasts of different time scales are computed through Table 4 by a weighted average of the different percentiles of the empirical cumulative probability distribution.
Following Table 5 depicts Ensemble models of different-timescale forecast in terms of sequence defined by steps 1101, 1102 and 1103
The aforesaid steps 1101 to 1103 have been also referred in the form of control flow as depicted in
At step 1101, the high timescale (S1) as obtained from ensemble step 2 is integrated to low timescale data as follows
S1A(i)=Σk=(i−1)*JiJS1(k)
Wherein J is the times of S1 high timescale to low timescale, i.e., J=3 for 3 month integrates to 1 quarter.
At step 1102, optimization options as referred in Table 5 are executed to get parameters of (α1, α2). Specifically, the step 1102 corresponds to determining one or more weights based on any function of one or more of a training error, a validation error, the derivatives, and the combinations thereof associated with the corresponding machine learning models and time series models.
At step 1103, the parameters (α1, α2) of step 1102 are used to compute the final forecast result (fc) as:
More specifically, step 1104 refers adjustment of high time scale forecast S1(n) based on low time scale or Low granularity forecast S1(A) and S2 to compute the final forecast result (fc) as the high time scale forecast.
The present subject matter accordingly renders comprehensive-system architecture to transform and unify the predictors data on different timescale through the transformation module 404. Instead of a point forecast, the proposed approach makes a forecast interval that generates a probability distribution of demand forecast through an ensemble of many predictive models as provided by the demand forecast module 406, whereby the optimal forecast percentile is chosen by the ensemble learning module 408. As a result, the present subject matter is robust, adaptive and addresses the uncertainties generated in the forecast module 406 due to unification of regressors at different timescales. Moreover, such an approach enables incorporation of the domain expertise
Overall, the ML based system in accordance with the present subject system accepts heterogeneous timescale series data. In particular, the index data with lower timescale than the target product is allowed as input and accordingly improved forecast accuracy is provided by maximizing the use of valuable index data. At least based on transforming index data from low timescale to high timescale (e.g. quarterly to monthly, yearly to monthly) forecast may be made across a uniform timescale.
Based on the proposed approach to ensemble different timescale data into a uniform high timescale data, more accurate forecast by adaptively ensembling forecasts from multiple ML models with different timescales.
In addition, the present subject matter proposes interface to user, domain expert, to tune key parameters, such as optimal percentile setting, to improve forecast under high fluctuation scenarios.
In a networked deployment, the computer system 1000 may operate in the capacity of a server or as a client user computer in a server-client user network environment, or as a peer computer system in a peer-to-peer (or distributed) network environment. The computer system 1000 can also be implemented as or incorporated into various devices, such as a personal computer (PC), a tablet PC, a personal digital assistant (PDA), a mobile device, a palmtop computer, a laptop computer, a desktop computer, a communications device, a wireless telephone, a land-line telephone, a web appliance, a network router, switch or bridge, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single computer system 1000 is illustrated, the term “system” shall also be taken to include any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.
The computer system 1000 may include a processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both. The processor 1002 may be a component in a variety of systems. For example, the processor 1002 may be part of a standard personal computer or a workstation. The processor 1002 may be one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data The processor 1002 may implement a software program, such as code generated manually (i.e., programmed).
The computer system 1000 may include a memory 1004, such as a memory 1004 that can communicate via a bus 1008. The memory 1004 may be a main memory, a static memory, or a dynamic memory. The memory 1004 may include, but is not limited to computer readable storage media such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one example, the memory 1004 includes a cache or random access memory for the processor 1002. In alternative examples, the memory 1004 is separate from the processor 1002, such as a cache memory of a processor, the system memory, or other memory. The memory 1004 may be an external storage device or database for storing data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store data. The memory 1004 is operable to store instructions executable by the processor 1002. The functions, acts or tasks illustrated in the figures or described may be performed by the programmed processor 1002 executing the instructions stored in the memory 1004. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firm-ware, micro-code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like.
As shown, the computer system 1000 may or may not further include a display unit 1010, such as a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information. The display 1010 may act as an interface for the user to see the functioning of the processor 1002, or specifically as an interface with the software stored in the memory 1004 or in the drive unit 1016.
Additionally, the computer system 1000 may include an input device 1012 configured to allow a user to interact with any of the components of system 1000. The input device 1012 may be a number pad, a keyboard, or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to interact with the computer system 1000.
The computer system 1000 may also include a disk or optical drive unit 1016. The disk drive unit 1016 may include a computer-readable medium 1022 in which one or more sets of instructions 1024, e.g. software, can be embedded. Further, the instructions 1024 may embody one or more of the methods or logic as described. In a particular example, the instructions 1024 may reside completely, or at least partially, within the memory 1004 or within the processor 1002 during execution by the computer system 1000. The memory 1004 and the processor 1002 also may include computer-readable media as discussed above.
The present invention contemplates a computer-readable medium that includes instructions 1024 or receives and executes instructions 1024 responsive to a propagated signal so that a device connected to a network 1026 can communicate voice, video, audio, images or any other data over the network 1026. Further, the instructions 1024 may be transmitted or received over the network 1026 via a communication port or interface 1020 or using a bus 1008. The communication port or interface 1020 may be a part of the processor 1002 or may be a separate component. The communication port 1020 may be created in software or may be a physical connection in hardware. The communication port 1020 may be configured to connect with a network 1026, external media, the display 1010, or any other components in system 1000 or combinations thereof. The connection with the network 1026 may be a physical connection, such as a wired Ethernet connection or may be established wirelessly as discussed later. Likewise, the additional connections with other components of the system 1000 may be physical connections or may be established wirelessly. The network 1026 may alternatively be directly connected to the bus 1008.
The network 1026 may include wired networks, wireless networks, Ethernet AVB networks, or combinations thereof. The wireless network may be a cellular telephone network, an 802.11, 802.16, 802.20, 802.1Q or WiMax network. Further, the network 1026 may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols.
In an alternative example, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement various parts of the system 1000.
Terms used in this disclosure and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including, but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes, but is not limited to,” etc.).
Additionally, if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation, no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations.
In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” or “one or more of A, B, and C, etc.” is used, in general such a construction is intended to include A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc. For example, the use of the term “and/or” is intended to be construed in this manner.
Further, any disjunctive word or phrase presenting two or more alternative terms, whether in the description of embodiments, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms or both terms. For example, the phrase “A or B” should be understood to include the possibilities of “A” or “B” or “A and B.”
All examples and conditional language recited in this disclosure are intended for pedagogical objects to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Although embodiments of the present disclosure have been described in detail, it should be understood that various changes, substitutions, and alterations could be made thereto without departing from the spirit and scope of the present disclosure.