The technology described herein relates generally to data modeling and more specifically to computer-implemented scenario analysis.
Data analysts are often charged with the task of predicting what effect a given future event might have on an organization. Similarly, they may also seek to determine what events need to happen between now and a future time for an organization to reach a specified goal. They may be further tasked to discover how certain contributing factors can be manipulated to maximize a metric for an organization, such as sales or inventory turnover or profit. Statistical data models may be used as tools in aiding data analysts in making these and other determinations. However, the selection of data models to make these determinations is difficult, especially when ensuring the best probability of a successful prediction.
In accordance with the teachings provided herein, computer-implemented systems and methods are provided for implementing a scenario analysis manager that performs multiple scenarios based upon time series data that is representative of transactional data. For example, a system and method provides a set of candidate predictive models for a first scenario for selection where the set of candidate predictive models includes an identification of which variables are associated with a model. Model selection data is received from a scenario analysis manager where a selected model is configured to predict a future value of a first variable based at least in part on values of a second variable. Time series data is received representative of past transaction activity of the first variable and the second variable, and data representative of a future value of the second variable is also received. The future value of the first variable is determined using the selected model, the time-series data and the future value of the second variable.
As another example, a system and method is provided for storing data for access by a scenario analysis management application program that performs multiple scenarios within a project based upon time series data that is representative of transactional data. A memory may include one or more data structures stored in the memory that include information used by the application program that includes project records containing one-to-many scenario links between the project and one or more scenarios associated with the project. The memory may further include scenario records containing one-to-one model links between a scenario and a model associated with the scenario and model records containing one-to-many predictive variable links between a model and one or more variables associated with the model. The memory may also include past value records containing time series data representative of transaction activity involving a first variable and a second variable, where the first variable and the second variable are associated with a model through the predictive variable links. Future value records identifying one or more future values of the second variable may also be included as well as a scenario value record for storage of a future value of the first variable determined using a model identified by a scenario record that calculates the scenario value using past record values and a future value record. The scenario analysis manager may compute the future value of the first variable for each scenario in a project, each scenario being identified by a scenario link, the computing using a model associated with the scenario, the model being identified by a model link, where the model receives as inputs past values of the first variable and the second variable and the future value of the second variable to compute the future value of the first variable. The scenario analysis manager may display the future value of the first variable for multiple scenarios simultaneously.
As another example, a computer display device is provided for generating a scenario analysis manager graphical user interface for displaying a future value of a first variable for multiple scenarios simultaneously may include a project definition display region for defining one or more scenarios associated with a project and a model selection region for providing a set of candidate models for selection and for receiving selection data, where a selected model is configured to predict a future value of a first variable based at least in part on past values of a second variable and a model selection region that identifies one or more independent and dependent variables associated with a model. The graphical user interface may further include a future value definition region for receiving data defining one or more future values of the second variable and a scenario display region for providing a graphical depiction of a calculated future value of the first variable determined using the selected model, past values of the first variable, past values of the second variable, and the one or more future values of the second variable, where the scenario display region displays the future value of the first variable for multiple scenarios simultaneously.
As a further example, a computer-implemented system method is provided for implementing a scenario analysis manager that performs multiple scenarios based upon time series data that is representative of transactional data and displays results of the multiple scenarios simultaneously may include providing a set of candidate predictive models for a first scenario for selection where the set of candidate predictive models includes an identification of which variables are associated with a model. Model selection data can be received where a selected model is configured to predict a future value of a first variable based at least in part on values of a second variable and receiving time-series data from a computer-readable memory representative of past transaction activity of the first variable and the second variable. Data representative of a future value of the second variable may be received and the future value of the first variable may be determined using the selected model, the time-series data, and the future value of the second variable. The future value of the first variable for the first scenario may be stored in a computer-readable memory, and the future value of the first variable may be displayed simultaneously with a future value of a second scenario.
As an additional example, a computer-implemented system and method is provided for implementing a scenario analysis manager that performs multiple scenarios based upon time series data that is representative of transactional data and displays results of the multiple scenarios simultaneously may include a processing system including at least one data processor and a computer-readable memory coupled to the processing system. Software instructions may be configured to execute steps that include providing a set of candidate predictive models for a first scenario for selection where the set of candidate predictive models include an identification of which variables are associated with a model and receiving model selection data where a selected model is configured to predict a future value of a first variable based at least in part on values of a second variable. The software instructions may be further configured to receive time-series data from a computer-readable memory representative of past transaction activity of the first variable and the second variable and receive data representative of a future value of the second variable. The software instructions may also be configured to determine the future value of the first variable using the selected model, the time-series data, and the future value of the second variable and store the future value of the first variable for the first scenario in a computer-readable memory. The future value of the first variable can be displayed simultaneously with a future value of a second scenario.
The scenario analysis manager 104 computes one or more future values for a first variable based on past values of the first variable and a second variable as well as proposed future values of the second variable. The scenario analysis manager 104 further enables the creation and simultaneous comparison of multiple scenarios which may vary in the data model used, input variables considered, past time-series data considered, future predicted value inputs, as well as many other factors. For example, a user 102 may want to hypothesize the effect on weekly profits for a region of retail stores that a manipulation on product pricing may make. After selection of a model, past time-series data relating price and profits for the region of retail stores is provided to the model for training such as via linear regression or other statistical processes. Data is then input as to one or more future values of product price. For example, one desired scenario may raise prices 10%, one scenario may lower prices 15%, one scenario may lower prices by 5% each week for 3 weeks, and one scenario may keep prices the same. The scenario analysis manager receives this future hypothetical data for the second price variable and determines predicted values for the first regional profit variable for each of the desired scenarios. Each of the predicted values for each of the scenarios may be presented for the three weeks in the form of a line graph or other type of graph. A user 102 may select one of the scenario predictions that the user thinks is most likely to match future results and may persist that prediction as the forecast to be used in other calculations. For example, a user 102 may decide to lower prices 15% and may, thus, select the scenario prediction associated with the 15% price reduction as the forecast for profits for the associated region.
A scenario provides a determination as to how a generated forecast may change when one manipulates the future values of one or more independent or dependent variables. A scenario analysis manager may be configured to perform one or more of a variety of functions related to a given scenario. In a first mode of operation, as described above, a prediction of a future value of a first dependent variable is generated by the scenario analysis manager based on past values of the first dependent variable, past values of one or more independent variables, and one or more predicted values of the one or more independent variables. In a second, goal-seeking, mode of operation, a determination of future values for one or more first dependent variables is determined to reach a desired value for an independent variable. For example, in a goal-seeking operation, the scenario analysis manager may determine a future value of the independent price variable that will yield a desired future dependent variable, profit, value based on past values of the price and profit variables and the desired future value of the profit dependent variable. The scenario analysis manager may also perform in an optimization mode where one or more first future variable values are determined to maximize or minimize a second future value. For example, a scenario analysis manager may determine future values of independent variables price and advertising expenditures to maximize the profit dependent variable.
A scenario analysis manager may also be utilized in the testing of models using hold-out data. Hold-out data consists of a set of past data that is not used in training a model but is instead used in testing the accuracy of a model. Thus, a predicted value for a first variable may be determined by the scenario analysis manager based on a first set of past data for the first variable and a second variable as well as a second set of hold-out past data for the second variable, where the second set of hold-out past data is subsequent to the first set of past data. Thus, the known, hold-out data for the second variable is treated as a “future” value of the second variable. The scenario analysis manager then determines a predicted “future” value of the first variable based on the “future” hold-out data for the second variable. The predicted “future” value of the first variable determined by the scenario analysis manager may then be compared to the actual hold-out data for the first variable to determine the accuracy of the data model compared to real-life results.
With reference back to
The scenario analysis manager provides for quick and easy selection of one or more models for a project, selection of input data and future scenario data for utilization by the selected models, and execution of efficient and accurate scenario determinations by managing a number of data structures describing the state of a project.
Each of the data structures 306, 314, 318 may also contain other information about certain entities at their level. For example, the scenario data structure 306 may include data on each scenario 304 such as a scenario name, a scenario date of creation, a scenario description, as well as other data. The models data structure 314 may contain data on each model 310 such as a model name, a model date of creation, a model description, a model input data type, a model output data type, as well as other data. The input variables data structure 318 may contain data on each input variable 316 such as an input variable name, an input variable type, and input variable description, as well as other data.
Each scenario record 408 identifies a model associated with the scenario via a model link 410 contained in the scenario record 408. The scenario analysis manager 402 further manages one or more model records 412. A model record contains one-to-many input variable links 414 between a model identified by the model record 412 and one or more variables associated with the model. The scenario analysis manager 402 further administers a plurality of past/future value records 416. The past/future records may, for example, contain time series data associated with the variables identified by the input variable links 414 associated with a model record 412. The past/future value records may include past and/or predicted future values of dependent and/or independent variables referenced by a model record 412. The scenario analysis manager 402 may also control scenario values 418 that contain future values determined by the scenario analysis manager 402 in running a scenario analysis identified by the project records 404, scenario records 408, model records 412, and past/future value records 416.
The scenario analysis manager 502 may further manage descriptive tables and records that provide information describing entities at each level (i.e., project level, scenario level, model level, variable level). The descriptive information may be incorporated into the links records described above or may be broken into separate data structures as shown in
The scenario analysis manager 502 may also manage desired manipulations to future values of the predictive variables. For example, as described above, one scenario may reduce a price variable by 10% per week for three weeks to examine the effect on regional profits. Such a manipulation may be stored in a scenario-manipulation table 532 that identifies one or more future manipulations to be made to a predictive variable over one or more future time periods. A scenario-manipulation table 532 may store the desired manipulation 534 (e.g., set a predictive variable, temperature, to 80 degrees Fahrenheit for future time period number 1, in predicting amusement park attendance) by scenario ID 510 and predictive variable ID 522. Records of the scenario-manipulation index may also be indexed by a manipulation index (not shown) which may be linked from the scenario-model links table 512 or other location.
Further, a new scenario interface 600 includes a model selection interface area 606 for displaying models and associated information about the models and for accepting selection of a model to associate with the scenario. The model selection interface area 606 provides data about a set of models available for selection for a scenario. Data provided may include a name and model type. The models may be ranked, as shown at 608, based on a quality metric. The quality metric may be based on one or more of a number of factors including prior user recommendations, hold-out data testing accuracy, percentage of times data from the model is persisted as a permanent forecast, as well as others. The model selection interface area 606 also may offer data regarding variables associated with each model, as shown at 610. The associated variables data 610 aides a user in selecting a model by identifying the variables that may be predicted by a model as well as to which variables a model is sensitive. Thus, if one wishes to analyze the effect of temperature on amusement park attendance, then one would use the variables data 610 to narrow selection choices to those models that are sensitive to the temperature variable. A new scenario interface 600 may also include a quick view 612 indicator for providing expanded information related to the model selection interface area 606.
A disk controller 1660 interfaces one or more optional disk drives to the system bus 1652. These disk drives may be external or internal floppy disk drives such as 1662, external or internal CD-ROM, CD-R, CD-RW or DVD drives such as 1664, or external or internal hard drives 1666. As indicated previously, these various disk drives and disk controllers are optional devices.
Each of the element managers, real-time data buffer, conveyors, file input processor, database index shared access memory loader, reference data buffer and data managers may include a software application stored in one or more of the disk drives connected to the disk controller 1660, the ROM 1656 and/or the RAM 1658. Preferably, the processor 1654 may access each component as required.
A display interface 1668 may permit information from the bus 1656 to be displayed on a display 1670 in audio, graphic, or alphanumeric format. Communication with external devices may optionally occur using various communication ports 1672.
In addition to the standard computer-type components, the hardware may also include data input devices, such as a keyboard 1672, or other input device 1674, such as a microphone, remote control, pointer, mouse and/or joystick.
U.S. patent application Ser. No. 11/432,127, entitled “Computer-Implemented Systems and Methods for Defining Events,” describes systems and methods for defining events; the entirety of which is herein incorporated by reference. U.S. patent application Ser. No. 11/431,123, entitled “Computer-Implemented Systems and Methods For Storing Data Analysis Models,” describes systems and methods for storing data analysis models; the entirety of which is herein incorporated by reference. U.S. Pat. No. 7,251,589, entitled “Computer-Implemented System and Method For Generating Forecasts,” describes systems and methods for generating forecasts; the entirety of which is herein incorporated by reference.
This written description uses examples to disclose the invention, including the best mode, and also to enable a person skilled in the art to make and use the invention. The patentable scope of the invention may include other examples. For example, the systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, internet, combinations thereof, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices. The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.
This application is a continuation application of U.S. patent application Ser. No. 12/611,497 filed Nov. 3, 2009, entitled “Computer-Implemented Systems and Methods for Scenario Analysis,” the entirety of which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 12611497 | Nov 2009 | US |
Child | 13772200 | US |