The present invention relates in general to data analysis and more particularly to a method and apparatus employing one or more of a logistical regression, computational analysis, and decision matrix into a visual analysis tool.
Conventional systems and devices for statistical analysis typically employ computing systems to perform computations in a relatively short period of time. These conventional approaches to statistical analysis are typically based on determining particular quantities or values (e.g., mean, median, mode percent change). These values may be used for analysis of past events and/or performances. While these values may be useful for characterizing past results, the values usually provide little to no insight for future performance. Additionally, these values may not readily present correlations between computed data. Translating computed values of past performance into charts or graphs may be one means of conveying results visually.
Conventional systems and methods employ many types of modeling techniques and statistical tools for correlating data. These correlations may be employed for many uses. Identifying correlation relationships between two or more variables is rarely an easy task, as one or more computed results may not provide a clear indication for making a decision. Further, employing the conventional methods for decision making based on processed data may also be difficult. Accordingly, there is a need for a method and apparatus that addresses one or more of the aforementioned drawbacks.
Disclosed and claimed herein are a method and apparatus for analyzing data to provide decision making information. In one embodiment, a method includes receiving data corresponding to an agent for one or more predictor variables of a model, calculating coefficients of the model based, at least in part, on a logistic regression analysis for a response variable to determine probability densities of the response variable, wherein the response variable is associated with the one or more predictor variables. The method further includes performing a computational analysis of the response variable based on the probability densities of the response variable to determine variation in the probability densities of the response variable. The method further includes generating a decision matrix, reflecting probabilities of one or more response variables and analysis values.
Other aspects, features, and techniques of the invention will be apparent to one skilled in the relevant art in view of the following detailed description of the invention.
One aspect of the present invention is directed to analyzing data to provide decision making information. In one embodiment a method may be provided for translating behaviors into a visual decision package. The method may include a combination of statistical predictions, financial (e.g., economic utility) analysis, and game theory to provide a comprehensive decision analysis tool. The method may be based on a model, wherein coefficients of the model may be calculated based on a logistic regression analysis. Probability densities of response variables may be determined, wherein the response variables may be associated with the one or more predictor variables. The method may further include performing a computational analysis, such as a Monte Carlo simulation. In that fashion probability densities of the response variable may be used to determine variation in one or more response variables. The method may include generating a decision matrix using an game theory, wherein probabilities of one or more response variables may be presented with one or more analysis values. In that fashion business decisions and strategies may be determined based on the generated results.
According to another embodiment, a process may be provided for selecting one or more agents or financial instruments based on a threshold value. Selection may be based on a scenario, model and/or user defined attributes. As used herein, an agent may correspond to one or more parties whose actions may be employed in one or more of the logistic regression analysis, computational analysis and modeling.
When implemented in software, the elements of the invention are essentially the code segments to perform the necessary tasks. The program or code segments can be stored in a processor readable medium. The “processor readable medium” may include any medium that can store or transfer information. Examples of the processor readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory or other non-volatile memory, a floppy diskette, a CD-ROM, an optical disk, a hard disk, a fiber optic medium, etc. The code segments may be downloaded via computer networks such as the Internet, Intranet, etc.
Referring now to the drawings,
According to another embodiment, analysis tool 110 may be employed for providing predictive analysis information based on statistical predictions, financial analysis and/or game theory. Analysis tool 110 may be used for one or more applications, such as a payment default predictor, marketable opportunities, billing defaults, subscription services, mortgage loan servicing, internet associations (e.g., dating sites), donation fundraising, markets segmentation analysis, portfolio management, strategic innovation analysis, etc. To that end, analysis tool 110 may provide decision making information 115 for one or more applications. Decision making information 115 may be provided to a user of analysis tool 110 as a graphical image, graphical diagram and/or printed output.
Referring now to
I/O interface 215 of analysis tool 200 may be configured to receive and/or output data. For example, a user may employ I/O interface 215 to provide data which may be utilized by processor 205 to determine decision making information. Analysis tool 200 may further include optional display 220 for outputting decision making information. It may also be appreciated that analysis tool 200 may be coupled to an external display (not shown) by I/O interface 215.
Referring now to
Referring now to
According to one embodiment, process 400 may be utilized to analyze scenarios wherein two or more agents are making decisions. For example, a mortgage scenario where one agent is a lender and another agent is the mortgage holder. Further, process 400 may be employed to determine the likelihood of the occurrence of a dependent variable (e.g., mortgage default in
Based on a posteriori data received at block 405, logistic regression modeling may be used to generate model weights at block 410. Creation of a model may be initiated by assuming that the conditional expectation of a dependent variable or response variable Y is equal to a linear combination of independent variables and coefficients Xtβ such that:
Y=X
Tβ+ε Equation 1
where ε is a general error term.
Based on the business application the model will address, the response variable Y is discrete following the binomial distribution. A binomial distribution may be described where Y is the number of successes, n is the number of trials and n is the probability of success:
The link function for the binomial distribution is the logit function, that is unknown probabilities for the response variable Y modeled as a linear function of independent variables X. The logit function for the binomial distribution may be described as:
Model weights may relate to coefficients described in equation III. At block 415, data may be collected based on the scenario modeled. For example, data collected at block 415 may include loan pool data, economic condition data, demographic data and investment cost. Based on model weights generated at block 410 and data collected at block 415, probabilities may be calculated for each response at block 420. Coefficients of the model, variation around the coefficients, and variations around the independent variables may be used to calculate probability responses for various responses of the dependent variable at block 420. Calculating probabilities at block 420 may include by determining a second order log likelihood of coefficients in equation 4.
The standard error of the coefficients may be determined by taking the square root of the inverse of the second order log-likelihood of coefficients equation 5.
The probability of a response of each variable thus may be described by equations 6-8.
In the logistic regression, it may generally be assumed that variables are independently distributed (i.e., cases are independent), and the distribution of Yi is Bin (ni, ni) (i.e., binary logistic regression assumes binomial distribution of the response), and that there is a linear relationship between the logit of the independent variables and the response. Further, in the logistic regression homogeneity of the variance does not need to be satisfied, while errors need not normally be distributed but must be independent. While the foregoing equations are described as calculating probabilities of expectation of a dependent variable, it may be appreciated that other equations may be employed or incorporated.
At block 425, calculated probabilities may be output for further analysis. As will be described below in more detail with respect to
Referring now to
At block 510, business actions may be defined for the model. In one embodiment, business actions may correspond to one or more actions which may be performed by a business associated with customer actions defined at block 505. Prediction variables may be defined at block 515 including but not limited to loan-to-value (LTV), combined loan-to-value (cLTV) and/or debt to income ratio (DTI).
At block 520, data may be imported by the analysis tool. The analysis tool may code data at block 525. Coding data can include assigning a sequential number to an attribute or test data. The number may be assigned based on the variables contributed to the response. For example, a simple model can be used to relate change to a categorical response or dependent variable to the change of an ordered input or independent variable. As such, data may be partitioned for processing by a field. Further at block 525, a model may be created based on a logistic regression according to one or more embodiments of the invention and as described above in
Process 500 may further include the optional act of optimizing the model in block 535. Model optimization may be based on specific scenarios which may be modeled and or correcting a particular model. The model may be optimized to include additional and/or different actions performed by an agent.
Referring now to
At block 615, received customer data may be coded by the analysis tool for application to the selected model. Based on coded data at block 615, coefficients may be calculated for the model at block 620. Probabilities of one or more responses or dependent variables may be calculated at block 625 by analysis tool. Financials may then be calculated at block 630. By way of example, the analysis tool can calculate net present value (NPV) based on the model results and likely net outcomes using a traditional method of discounting cash flow over a user defined time horizon additionally taking into account the likelihood of responses at each time frame. The analysis tool can display one or more of a calculated NPV, adjusted NPV and likely NPV for one or more outcomes modeled by the analysis tool. It may also be appreciated that financials calculated at block 630 may correspond to other quantities and/or indicators and is not limited to NPV.
A Monte Carlo simulation may be conducted at block 635 based on the calculated financials. In one embodiment, the Monte Carlo analysis may be carried out by defining a distribution (Φ) of interest around the coefficient and predictor variables. For each distribution, the mean (μ) of the standard deviation (σ) is defined or calculated. A margin of error (d) may also be defined by a user of the analysis tool. Analysis tool may then generate values of the distribution issuing a random number generator such that:
Variation in the probability of the response variables may be calculated using the Monte Carlo analysis by the analysis tool. The probability of each of the response variables then forms a distribution with a center and a variance. The probabilities of the response variables are mutually exclusive.
The probabilities of the response variables are then provided in a decision matrix as a stand in for the marginal utility obtained by one of the agents. Marginal utility can generally express many tangible and intangible factors that may be difficult to quantify. Thus, an advantage of the process performed by the analysis tool in process 600 is that likely responses based on a posteriori information may be used as marginal utility (e.g., the frequency of action that an agent has taken in the past are the marginal utilities of those actions). A decision matrix is generated at block 640 containing the marginal utilities. The decision matrix is described below in more detail with reference to
Results of the report may be displayed for a user of the analysis tool at block 645. Based on results displayed by a user of the analysis tool, one or more decisions may be made. It should also be appreciate that additional reports may be generated by an analysis tool as a result of one or more acts of process 600.
Referring now to
Referring now to
Referring now to
At block 920, the user can view output of the analysis tool for making a decision associated with the scenario modeled. For example, the analysis tool may be used to analyze a mortgage loan for resale. According to another embodiment, process 900 may be performed by an analysis tool to provide an estimate of a value, such as a loan purchase price. Process 900 may further include the optional act of customizing data for the scenario modeled at block 925.
Referring now to
Referring now to
Based on the predicted NPV 1120, the analysis tool may then perform a decision making step at block 1135. When the predicted NPV is below a user selected threshold (“No” path out of decision block 1135), the analysis tool can indicate that the individual loan is below the threshold at block 1140. When the predicted NPV is acceptable based on the user selected threshold (“Yes” path out of decision block 1135), the analysis tool can indicate that the individual loan is below the threshold at block 1145. It may also be appreciated that the analysis tool can provide one or more graphical representations 1150 of the individual loan for analysis of one or more loans.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art. Trademarks and copyrights referred to herein are the property of their respective owners.
This application claims the benefit of U.S. provisional Application No. 61/019,801 filed Jan. 8, 2008.
Number | Date | Country | |
---|---|---|---|
61019801 | Jan 2008 | US |