1. Field of the Invention
This application relates generally to modeling behaviors associated with loans, and more specifically to projecting prepayment, delinquency, and default probabilities associated with mortgage loans across loan ages.
2. Related Art
Prepayment, delinquency, and default projections are important elements in the valuation of servicing portfolios as well as in new deal pricing assessment. Curves or models representing these values have typically been built using curve-fitting exercises based on past data. While such an approach is useful in some instances, these curves are generally unreliable if there are significant market disruptions or if the curves change dramatically.
With the recent turmoil in the financial markets, changes in borrower behavior have resulted in significant changes to these curves. Continual fine-tuning and adjustments may now be required to accommodate these changes when performing curve fitting. As such, there is less confidence in future predictions through curve fitting because the fine tuning process is manual and relatively unstructured.
It would be desirable to have a statistically sound prediction approach deploying a suite of advanced statistical approaches, which can accommodate market shocks without requiring significant, continual fine-tuning.
Aspects in accordance with the present invention meet these needs by providing a statistically sound prediction approach for predicting loan attributes using regression techniques that can accommodate market shocks and that does not require significant, continual manual fine-tuning. A plurality of models are generated and used to predict the probability of loan behaviors, such as default, delinquency, and pre-payment.
Aspects include a method of generating a model for predicting loan behavior, the method including receiving loan data for a plurality of loans; preparing the loan data for analysis; grouping the loans into a plurality of hierarchical segments based on shared characteristics; generating a logistic regression model for each segment; and generating an overall prediction model for at least one of prepayment, delinquency, and default across the plurality of segments.
Preparing the loan data for analysis may include formatting, imputing missing data, and/or applying an outlier treatment to the loan data. Imputing missing data may include resetting interest rates applicable to ARM products. Applying an outlier treatment may include limiting the values of a particular field to a certain range. Grouping the loans into a plurality of segments based on shared characteristics may include grouping the loans based on loan type, change in Housing Price Index (HPI) since origination, and/or loan age. Generating a logistic regression model for each segment may include generating a regression model for the probabilities of prepayment, default, and/or delinquency for each of the segments.
Aspects may further include generating a calendar month wise model by applying the corresponding model to generate probabilities for each segment for the calendar month and combining the generated probabilities.
Aspects may further include scoring each loan at each age for probability or prepayment, default, and delinquency based on the corresponding generated models and the relevant data for each loan.
Aspects may further include calculating the current amount outstanding at the end of each month based on the generated probability models, calculating a probability of prepayment from the prepayment model, and/or calculating a projected unpaid principle balance at each age of the loan by multiplying the probability of prepayment by the current unpaid balance.
Aspects may further include a system for generating a model for predicting loan behavior, the system including means for receiving loan data for a plurality of loans; means for preparing the loan data for analysis; means for grouping the loans into a plurality of hierarchical segments based on shared characteristics; means for generating a logistic regression model for each segment; and means for generating an overall prediction model for at least one of prepayment, delinquency, and default across the plurality of segments.
Aspects may further include a system for generating a model for predicting loan behavior, the system including a processor; a user interface functioning via the processor; and a repository accessible by the processor; wherein the repository is configured to receive and store loan data for a plurality of loans, and wherein the processor is configured to: prepare the loan data for analysis; group the loans into a plurality of hierarchical segments based on shared characteristics; generate a logistic regression model for each segment; and generate an overall prediction model for at least one of prepayment, delinquency, and default across the plurality of segments.
Additional advantages and novel features of these aspects of the invention will be set forth in part in the description that follows, and in part will become more apparent to those skilled in the art upon examination of the following or upon learning by practice of the invention.
Various exemplary embodiments of the systems and methods will be described in detail, with reference to the following figures, wherein:
These and other features and advantages of this invention are described in, or are apparent from, the following detailed description of various exemplary embodiments.
A statistically sound prediction approach for predicting loan attributes is described herein using regression techniques that can accommodate market shocks and does not require significant, continual manual fine-tuning. A plurality of models are generated and used to predict the probability of loan behaviors, such as default, delinquency, and pre-payment.
Modeling engine 120 may be configured to perform statistical processing on a predefined population of data to generate models which may be used to predict customer behaviors associated with various loan products. According to some aspects, the predefined data population includes historical loan data. The data may be split through a hierarchical clustering schema into a number of predetermined categories.
As depicted at 206, for each loan type category, loans are further divided based on the housing price index (HPI) change associated with the loan since origination. For example, loans may be grouped as having an HPI change of less than −5% since origination, having an HPI change between 0% and 5% since origination, and having an HPI change greater than 10% since origination. Other groupings may also be used. HPI predications may be performed using econometric and time series ARIMA models, according to some aspects of the invention. HPI values, and thus the change from origination, may be available for any loan at any point in time.
Once the population has been segmented by product type and HPI change, the segments may be further split according to age of the loan, as depicted at 208. In according to some aspects, the loans may be segmented at each individual age level (e.g., 1 month, 2 months, etc.). In other aspects, loans may be grouped into age ranges, such as 1-5 months, 6-10 months, and/or any other age range grouping. For example, the loans may be separated into one month segments across 60 months. HPI change since origination and load age are values which change over time. As such, the segments may be considered dynamic segments. As the loans age, they move from one segment to another over time, and therefore have a different regression equation applied to them at different times.
Referring back to
To predict age level probabilities, all loans belonging to a particular age or age group are scored using appropriate models. The selection of models may depend on the product type and the HPI change observed since origination. The projected unpaid balance at each age is calculated by deducting the probability of default and probability of prepayment from the previous unpaid balance iteratively. These probabilities may be multiplied by their projected unpaid balance to get an expected voluntary payoff balance for that particular age. The voluntary payoff balance may be added across all loans of a particular age to get the total expected voluntary payoff balance. This value may then be divided by the total projected unpaid balance of loans of that age to determine a percentage of prepayment at that age. This value is known as the Single Month Mortality Rate (SMMR). The SMMR may then be converted to an annual Constant Default Rate (CPR) for the particular age, and the same process may then be repeated for all ages.
Model engine 120 may be further configured to calculate the probability of certain behaviors, using the generated models. Model engine 120 may be configured to generate probabilities of one or more behaviors for each age or age range for which models have been generated. For example, model engine 120 may be configured to calculate the probability of delinquency, prepayment, and default for each month of a loan's life.
The above-mentioned probabilities may be calculated using the following equation:
where α represents a regression model constant, β represents coefficient of prediction for predictor X, and X represents any variable used in generating the model. Probability prediction module 140 may be configured to generate models plotting the probabilities. Table 2 illustrates exemplary variables, any combination of which may be used in generating a model for each segment. These variables may relate, for example, to the borrower, the property, the loan, or the product. Different combinations of variables may be employed with different coefficients for each of the models. Regression model constant α and coefficient predictors β1, β2, β3 may be selected to match the equation to a measured curve.
All the variables listed in Table 1 are variables as of loan origination. The models may be built with only such variables, because in order to make future projections the unknown future values of these variables would be required. Therefore, the information available as of loan origination may be used as predictors in the generated models. For second lien models, the same variables may be used, however, the variables for both the second lien and its corresponding first lien will be required. The corresponding first lien may be scored as the second lien probabilities dependent on the behavior of its first lien.
Storage 140 may be provided to permanently and/or temporarily store data used in generating models and/or calculating probabilities for particular loans. For example, storage 140 may be configured to store historical loan data, data received from a deal tape, calculated probabilities, and/or other data.
Upon receipt of the deal tape, data necessary for predicting a desired attribute may be formatted and prepared for modeling, as depicted at 320. Formatting may include, for example, replacing values of certain fields with standard values, which may be used in performing the prediction calculations. Formatting the data may also include imputing any missing data, creating any derived fields, and treating any outlier data.
When imputing missing data, the missing value may be replace with its most logical and probable value. For example, a field resetting interest rates may only be applicable to ARM products since a fixed product has a fixed rate across the lifetime of the product. Thus, if this field is blank or missing for a fixed product, a predefined status indicator may be inserted into the field.
Creating derived fields may include numerical/logical transformations of existing fields. For example, an HPI Change field may be derived for each loan based on the value at origination and current or projection values.
Outlier treatment may include limiting the values in a particular field to a certain range. Extreme values may cause the data to be unduly biased. For example, an original balance of greater than $2,000,000 may only apply to less than 0.5% of all loans. However, if such values were included in a calculation, a large biasing effect could result. As such, in accordance with some aspects of the invention, this balance may be limited to a value of $2,000,000.
Some variables used in performing calculations are derived from other fields. For example, one variable used in some models is the “Change over time in Housing Price Index.” This variable may be derived from the housing price index as of loan origination and the housing price index as of the current date. Other variables may also be derived from other fields.
After data has been formatted, hierarchical clusters may be created, as depicted at 330. More particularly, the loans may be divided into predetermined categories, such as those depicted in
Among others, the loans may be segmented across first lien fixed, first lien ARM 2/28, first lien ARM 3/27, and junior lien products.
For example, HPI change may be evaluated across loans from their origination and the groups of loans may be further categorized based on HPI change.
As the loans are segmented across age groups, models may be built at each individual age level segment. However, when the necessary amount of data is unavailable across all age groups, for each of the product and HPI Change segments, groups of ages may be combined above a certain point.
As depicted at 340, probabilities of voluntary prepayment, default, and delinquency may be determined for each age/age range, by running a model engine.
For the default model, a separate classification may be used. For example, adequate data might not be present to build a model specific to the different products. Therefore, a common set of models may be built for the default model. In addition, a different range of HPI Change may be applied to generate the default models. For example, the ranges may include (a) HPI Change<−10%, (b) −10%<HPI Change<−5%, (c) −5%<HPI Change<5%, and (d) HPI Change>5%. Additionally, for generating the default models, after the loans have been segmented based on HPI Change and age, the loans may be further segmented based on the probability of delinquency of the loan at that time.
For junior liens, the corresponding CPR/delinquency and default generated for the first liens may be applied as predictors into the models for the junior lien loan predictions.
According to some aspects, the current amount outstanding at the end of each month may be calculated based on the determined probability scores. The unpaid principal balance (UPB) at each age of the loan may be calculated, taking into consideration the probabilities of prepayment, default, and delinquency. These probabilities may be considered in combination, according to some aspects, or may be considered separately in other aspects of the invention.
Once the models are generated for each of the loan segments, each loan may be scored at each age for probability of voluntary prepayment, default, and delinquency. This scoring is done using the generated models for the appropriate segments and relevant loan data for each loan. A current amount outstanding at the end of each month may be calculated using probability of prepayment and default. Then, the aggregate scores of all the loans can be used to generate final CPR/CDR/Delinquency models
For example, the projected UPB at each age of a loan may be calculated by deducting the probability of prepayment from the previous UPB. More particularly, once the probability of prepayment has been calculated (as described above), the probability may be multiplied by the current unpaid balance. This value is then subtracted from the current unpaid balance to arrive at a projected UPB which accounts for prepayment. These calculations may be performed iteratively for each loan age.
An expected voluntary payoff balance for the particular age in question may be calculated by multiplying the projected UPB by the probability of prepayment. This process is repeated for all loans of a particular age. Next, the voluntary payoff values for each loan at the particular age may be combined to calculate the total expected voluntary payoff balance.
The total expected voluntary payoff balance may be divided by the total projected UPB of loans of that age to determine the percentage of prepayment. This value is known as the Single Month Mortality Rate (SMMR). The SMMR may be converted to an annual CPR for that age. This process may be repeated for all ages or defined age ranges.
For each calculated probability, the scores of all loans of a particular age may be aggregated and used to generate probability models, as depicted at 350. For example, to generate a Constant Prepayment Rate (CPR) model, the UPB calculated for each loan based on the probability of voluntary prepayment across a particular age is aggregated to get a total expected voluntary payoff balance. Similar calculations may be performed to generate Constant Default Rate (CDR) and delinquency models.
The generated models may be extracted, for example, into an XL file, and provided to Capital Markets or Business User teams
An exemplary implementation in accordance with aspects of the present invention will now be described for a CPR model for an ARM 2/28 product having HPI Change since origination of <−5% for an age of 26 months.
Table 3 lists variables and their corresponding β value estimate that may be used to generate this probability model for this segment.
Thus, for this loan, using this model, the score=α+X1 β1+X2 β2+X3 β3+X4 β4+X5 β5=−10.6233+7.9855*(0.15)+0.2142*0.02+0.5763*1−0.0244*40=−7.939527. The probability of repayment, p,=escore/(1+escore)=0.04%. Thus, in this example, there is a very low chance that this loan will prepay at this time.
The present invention may be implemented using a combination of hardware, software and firmware in a computer system. In an aspect of the present invention, the invention is directed toward one or more computer systems capable of carrying out the functionality described herein. An example of such a computer system 500 is shown in
Computer system 500 includes one or more processors, such as processor 504. The processor 504 is connected to a communication infrastructure 506 (e.g., a communications bus, cross-over bar, or network). Various software aspects are described in terms of this exemplary computer system. After reading this description, it will become apparent to a person skilled in the relevant art(s) how to implement the invention using other computer systems and/or architectures.
Computer system 500 can include a display interface 502 that forwards graphics, text, and other data from the communication infrastructure 506 (or from a frame buffer not shown) for display on a display unit 530. Computer system 400 also includes a main memory 508, preferably random access memory (RAM), and may also include a secondary memory 510. The secondary memory 510 may include, for example, a hard disk drive 512 and/or a removable storage drive 514, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive 514 reads from and/or writes to a removable storage unit 518 in a well-known manner. Removable storage unit 518, represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to removable storage drive 514. As will be appreciated, the removable storage unit 518 includes a computer usable storage medium having stored therein computer software and/or data.
Alternative aspects of the present invention may include secondary memory 510 and may include other similar devices for allowing computer programs or other instructions to be loaded into computer system 500. Such devices may include, for example, a removable storage unit 522 and an interface 520. Examples of such may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an erasable programmable read only memory (EPROM), or programmable read only memory (PROM)) and associated socket, and other removable storage units 522 and interfaces 520, which allow software and data to be transferred from the removable storage unit 522 to computer system 500.
Computer system 500 may also include a communications interface 524. Communications interface 524 allows software and data to be transferred between computer system 500 and external devices. Examples of communications interface 524 may include a modem, a network interface (such as an Ethernet card), a communications port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc. Software and data transferred via communications interface 524 are in the form of signals 528, which may be electronic, electromagnetic, optical or other signals capable of being received by communications interface 524. These signals 528 are provided to communications interface 524 via a communications path (e.g., channel) 526. This path 526 carries signals 528 and may be implemented using wire or cable, fiber optics, a telephone line, a cellular link, a radio frequency (RF) link and/or other communications channels. In this document, the terms “computer program medium” and “computer usable medium” are used to refer generally to media such as a removable storage drive 580, a hard disk installed in hard disk drive 570, and signals 528. These computer program products provide software to the computer system 500. The invention is directed to such computer program products.
Computer programs (also referred to as computer control logic) are stored in main memory 508 and/or secondary memory 510. Computer programs may also be received via communications interface 524. Such computer programs, when executed, enable the computer system 500 to perform the features of the present invention, as discussed herein. In particular, the computer programs, when executed, enable the processor 410 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 500.
In an aspect of the present invention where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 500 using removable storage drive 514, hard drive 512, or communications interface 520. The control logic (software), when executed by the processor 504, causes the processor 504 to perform the functions of the invention as described herein. In another aspect of the present invention, the invention is implemented primarily in hardware using, for example, hardware components, such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
While the foregoing disclosure discusses illustrative aspects and/or embodiments, it should be noted that various changes and modifications could be made herein without departing from the scope of the described aspects and/or embodiments as defined by the appended claims. Accordingly, the exemplary embodiments of the invention, as set forth above, are intended to be illustrative, not limiting. Various changes may be made without departing from the spirit and scope of the invention. Therefore, the invention is intended to embrace all known or later-developed alternatives, modifications, variations, improvements, and/or substantial equivalents. Furthermore, although elements of the described aspects and/or embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated. Additionally, all or a portion of any aspect and/or embodiment may be utilized with all or a portion of any other aspect and/or embodiment, unless stated otherwise.
The present application for patent claims priority to Provisional Application No. 61/163,228 entitled “APPARATUS AND METHOD FOR MODELING LOAN ATTRIBUTES” filed Mar. 25, 2009, the entire contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61163228 | Mar 2009 | US |