The technology described herein relates generally to risk factor simulation and more specifically to the application of different simulation techniques to different risk factors in a single simulation.
In order to forecast risk, a set of variables that describe the economic state of the world are simulated into the future. These variables are often called risk factors. The risk factors have different attributes and behaviors and are unique contributors to the entire economic system. The risk factors are often modeled as a correlated system. A simulation forecast of interest is usually not only a single point but a distribution of possible values in the future. Using the simulated forecasted values of the risk factors, a portfolio may be analyzed to calculate a risk measure, such as Value at Risk (VaR).
There are several popular simulation methods including: Monte Carlo simulation, covariance matrix simulation, historical simulation, scenario simulation, as well as others. All of these simulation methods have their own advantages and limitations. From a technical point view, each simulation methodology has one or more, but not all, of these advantages: an accurate forecast; easy specification; and fast simulation computation. Unfortunately each also suffers from one or more of the following drawbacks: inaccuracy of forecasts, difficult specification, and slow simulation computation. Traditionally, because of the importance of the correlation between risk factors, only a single simulation method was used for all risk factors in a risk management application.
In accordance with the teachings herein, computer-implemented systems and methods are provided for generating a simulated forecast based on members of a pool of input risk factor variables. Certain members of the pool of input risk factor variables are identified as members of a first set of variables, and certain other members of the pool of input risk factor variables are identified as members of a second set of variables. A first simulation is generated via a first simulation method using the first set of variables, and a second simulation is generated via a second simulation method that differs from the first simulation method using the second set of variables. The first simulation and the second simulation are generated using correlations among variables in the first set of variables and variables in the second set of variables.
As another example, a computer-implemented method for providing a simulated forecast based on correlated members of a pool of input risk factor variables representing input data includes identifying certain members of the pool of input risk factor variables as being members of a first set of variables and identifying certain other members of the pool of input risk factor variables as being members of a second set of variables. A first simulation is generated via a first simulation method using the first set of variables to generate a set of first results, and a second simulation is generated via a second simulation method that differs from the first simulation method using the second set of variables to generate a set of second results. The first simulation and the second simulation are generated utilizing correlations among variables in the first set of variables and variables in the second set of variables, and the set of first results and the set of second results are stored as a simulated forecast in a computer-readable memory.
As an additional example, a computer-implemented system for providing a simulated forecast based on correlated members of a pool of input risk factor variables representing input data includes a data processor. The system further includes a computer-readable memory encoded with instructions for commanding the data processor to perform a method that includes identifying certain members of the pool of input risk factor variables as being members of a first set of variables and identifying certain other members of the pool of input risk factor variables as being members of a second set of variables. A first simulation is generated via a first simulation method using the first set of variables to generate a set of first results, and a second simulation is generated via a second simulation method that differs from the first simulation method using the second set of variables to generate a set of second results. The first simulation and the second simulation are generated utilizing correlations among variables in the first set of variables and variables in the second set of variables, and the set of first results and the set of second results are stored as a simulated forecast in the computer-readable memory.
As a further example, a computer-readable memory may be encoded with instructions for commanding a data processor to perform a method that includes identifying certain members of the pool of input risk factor variables as being members of a first set of variables and identifying certain other members of the pool of input risk factor variables as being members of a second set of variables. A first simulation is generated via a first simulation method using the first set of variables to generate a set of first results, and a second simulation is generated via a second simulation method that differs from the first simulation method using the second set of variables to generate a set of second results. The first simulation and the second simulation are generated utilizing correlations among variables in the first set of variables and variables in the second set of variables, and the set of first results and the set of second results are stored as a simulated forecast in a computer-readable memory.
A hybrid simulation generator 104 may be utilized in a variety of ways. For example, users want to model multiple groups of risk factors that describe different sources of risk in one integrated system. Different risk factor groups may be best modeled by specific simulation methods. The hybrid simulation engine 104 provides one, easy mechanism to capture all the risk sources at the same time. As another example, it may be desirable to put time and effort into modeling risk factors that have a significant impact on a target forecast variable and to use simpler methods to model the remaining factors. This hybrid simulation engine provides flexibility for using more computational time on the risk factors that are deemed important and less time on the remaining risk factors. As a further example, it may be desirable to retain the correlation structure of a risk system which either is specified by the user 102 or extracted a time-series dataset. The hybrid simulation engine 104 provides the capability for using different simulation methods to subgroups of risk factor while retaining the original correlation structure among variables in those different simulations during the simulations.
A hybrid simulation engine 104 may increase capability and flexibility of simulations, simulate systems with various characteristics of risk factors, generated an integrated simulation result, improve performance without significant loss of accuracy, provide easy specification of large systems of risk factors, retain the original correlation relationships of all risk factors, as well as many other features as described herein. The system 104 contains software operations or routines for providing a simulated forecast based on correlated members of a pool of input risk factor variables representing input data, such as historical time-series data. The generated data model can be used for many different purposes, such as simulation of physical processes (e.g., manufacturing processes, financial transaction processes, etc.) over a period of time. The users 102 can interact with the system 104 through a number of ways, such as over one or more networks 108. One or more servers 106 accessible through the network(s) 108 can host the hybrid simulation engine 104. The hybrid simulation engine 104 provides a simulated forecast based on correlated members of a pool of input risk factor variables representing input data. The one or more servers 106 are responsive to one or more data stores 110 for providing input data to the hybrid simulation engine 104. Among the data contained on the one or more data stores 110 may be risk factor historical data 112 used in configuring data models for simulations as well as simulation models themselves 114. It should be understood that the hybrid simulation engine 104 could also be provided on a stand-alone computer for access by a user 102.
For example, historical time-series data for a set of risk factors, V1, V2, V3 and V4, may be received at 302. An automated variable set identification at 306 may determine that risk factors V1 and V3 have a high degree of information contribution, while risk factors V2 and V4 have a lesser degree of information contribution. Based on that determination, risk factors V1 and V3 may be identified as the first set of variables (“the priority set of variables”) while risk factors V2 and V4 are identified as the second set of variables (“the non-priority set of variables”). Because the priority set of variables has a high degree of information contribution, it may be desired to use a more expensive simulation method, such as a Monte Carlo simulation, to simulate those variables. While the non-priority set of variables may contribute less information, it may still be desirable to simulate those variables to maintain dependencies and correlations between non-priority set members and priority set members. Thus, the non-priority set of variables may be simulated using a less computation intensive simulation method such as a covariate simulation. The simulated outputs from the two different simulation techniques may then be output as a simulated forecast at 314.
As an example, in a large risk management system, there may be different expectations of historical data for simulation analyses. For example, in Basel II (2004), banks are required to use at least five years of data to estimate the probability of defaults from external, internal, or pooled data sources. For loss given default and exposure at default, the minimum data observation period should be seven years. However, if the available observation period for one of these data sources spans a longer period for any other sources and that data is relevant and material, the longer period must be used according to the requirement of Basel II. Such a requirement results in a different length of historical data for different groups of risk factors within the single risk management system. The hybrid simulation engine 502 may handle such a scenario by receiving variable set data dividing the risk factors into subgroups according to the length of available historical data. A proper simulation method is applied to each subgroup of risk factors based on the length of available historical data to be used, and simulated forecast values for the risk factors may be output while maintaining correlations among the risk factors in different subgroups.
Maintaining correlations among risk factors in different subgroups may be important for generating accurate forecasts in some scenarios. For a large risk management system, different risk factors, due to their source and modeling expectations may require different simulation models and may not be implemented in one single simulation. Some risk factors may require model based simulation; the others may require empirical historical simulation. A hybrid simulation combines different simulation methods in one single simulation run in order to generate an aggregated scenario of the world. When risk factors are modeled marginally within each subgroup, a correlation structure is oftentimes desired on top of the groups in order to capture of the dependency among different risk factors.
For example, for a collateralized debt obligation (CDO), it is important to understand the correlated dependency among the underlying entities in the CDO pool in addition to the risk characteristics of the each individual entity. One lesson learned through recent financial crises is that a risk management system should not segregate the risk factors because the dependency greatly affects the outcome of simulated results. Using CDOs as an example, the senior tranche (the safest portion of a CDO) benefits from a low correlation of the underlying entities in the pool, while the equity tranche (the least protected portion of a CDO) benefits from a high correlation. The correlation of the housing market to these tranches has often been significantly understated by analysts. Considering this correlation, the safest portion of the CDOs (e.g. a AAA rated senior tranche of mortgage backed security) actually suffers much bigger losses than expected without maintenance of the correlation. Ignoring the correlation has caused many financial institutions which either hold such “safe” investments or provide protection to some of the CDO tranches to fail.
A copula is a mathematical framework that enables the separation of the correlation of a system of variables based on a marginal distribution of the variables. A copula may be a multivariate distribution having uniformly distributed values over (0,1) inclusively. For an n-dimensional random vector U on the unit cube, a copula C is:
C(u1,u2, . . . ,un)=Pr(U1≦u1,U2≦u2, . . . ,Un≦un),
where Pr is a probability. A normal copula may be defined according to:
C
Σ,F
,F
, . . . ,F
(u1,u2, . . . ,uN)=ΦΣ(F1−1(u1),F2−1(u2), . . . ,FN−1(uN)),
The uniforms are then transformed to marginal distributions based on the different simulation methods, as shown at 714, 716 where uniforms are transformed using the first simulation method at 714 and uniforms are transformed using a second simulation method at 716. Generating a first simulation and generating a second simulation may include generating a conditional normal distribution for a dependent set of risk factors variables in the first set of variables using a Schur complement based on correlations among members of the pool of input risk factor variables. The simulated forecasts 718 are then output from the simulated forecast.
An example hybrid simulation utilizing a conditional normal approach and the same example utilizing a copula approach are provided below. The example scenario contains two subgroups of risk factors. The first set of risk factor variables contains variables that that are modeled using the log return of equity prices that follow a random walk. That is, normally distributed draws are made that represent changes in the return process:
returni,t=returni,t-1+εi,t, where
εi,t=σreturn
ei,t˜Normal(0,1).
The second set of variables contains only one risk factor, a spot interest rate, which is modeled as a CIR (Cos-Ingersoll-Ross) model. The formula for this model is:
ratet=ratet-1+κ*(θ−ratet-1)+δt, where
δt=σrate*√{square root over (ratet-1)}*ξt, where
ξt˜Normal(0,1).
In addition to the two models provided above, the two risk factors are related through the two error terms, as represented by the covariance matrix, Σ:
Converting independent random vectors to a correlated set of uniforms may utilize a Cholesky factorization of the covariance matrix. A Cholesky factorization is defined as:
Σ=LLT,
where L is a lower triangular matrix. For the sample covariance matrix above:
A multivariate normal distribution may then be simulated using the following steps:
(M1) Draw samples independently from normal(0,1). In the example scenario, three values are drawn in each scenario replication:
(M2) Transform the independent random draws to a correlated draw using the Cholesky factor:
Z=L
T
*R.
(M3) Apply Z for the error terms in the model.
The target variable in this case could be the price of a basket option of the two equities. The price of this basket option is a function of the two return processes and the rate process:
p
t
=f(return1,t,return2,t,ratet).
The hybrid simulation may be performed via multiple different approaches. For example, using a conditional normal distribution using standard statistical result, the rate process may be identified by a priority risk factor and may be simulated using a Monte Carlo simulation, while the return processes may be identified as non-priority risk factors simulated using a covariance simulation. Conditional on the realization of the rate process, the error terms of the covariance simulations may be a simulation from a conditional normal (for each ξt=X) with the conditional mean and conditional variance for the return process error terms according to:
followed by an application of (M1)-(M3) in the conditional bi-variate normal distribution defined above. The three risk factors are simulated within the same system to generate the forecasted distribution for the target variables.
As another example, using a copula approach, the distribution of each risk factor variable may be computed. These distributions may have a functional form. However, simulated distribution or empirical distribution calculation may also be performed. A simulation may then be performed from a multivariate distribution according to (M1)-(M3). Using the marginal distribution of each process, the simulated values from the multivariate normal may be converted to form a vector of random values ranging from 0 to 1. Using the inverse cumulative distribution function that corresponds to each marginal distribution computed, the converted simulated value may be transformed to generate a simulated value for each risk factor variable.
A disk controller 860 interfaces one or more optional disk drives to the system bus 852. These disk drives may be external or internal floppy disk drives such as 862, external or internal CD-ROM, CD-R, CD-RW or DVD drives such as 864, or external or internal hard drives 866. As indicated previously, these various disk drives and disk controllers are optional devices.
Each of the element managers, real-time data buffer, conveyors, file input processor, database index shared access memory loader, reference data buffer and data managers may include a software application stored in one or more of the disk drives connected to the disk controller 860, the ROM 856 and/or the RAM 858. Preferably, the processor 854 may access each component as required.
A display interface 868 may permit information from the bus 856 to be displayed on a display 870 in audio, graphic, or alphanumeric format. Communication with external devices may optionally occur using various communication ports 873.
In addition to the standard computer-type components, the hardware may also include data input devices, such as a keyboard 872, or other input device 874, such as a microphone, remote control, pointer, mouse and/or joystick.
This written description uses examples to disclose the invention, including the best mode, and also to enable a person skilled in the art to make and use the invention. The patentable scope of the invention may include other examples. For example, in addition to simulating risk factor variables, many other different types of variables may be simulated using a hybrid simulation engine. As a further example, the systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, interne, combinations thereof, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices. The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.