This invention relates to event forecasting using computerized systems and methods.
Many processes involve a number of events that occur over time. For example, the process of applying for a driver's license may involve the following eight events:
Each event in the process typically has an associated start date and end date. As an example, shown in Table 1, consider an applicant for a driver's license who begins the process on Feb. 18, 2010, passes the written and road tests, and is mailed a driver's license on Mar. 16, 2010:
This process may also be illustrated in a flow chart, as shown in
For a process such as this driver's license application process, it may be helpful to be able to forecast certain event dates. For example, if another applicant submits a written test on Mar. 22, 2010, it may be helpful to be able to forecast the start or end date of a particular event, such as the road test (assuming that the applicant had passed the written test), which in this example is Event #5, or the department of motor vehicles mailing a driver's license (assuming that the applicant had passed the road test), which in this example is Event #8. This invention deals with such event forecasts.
Certain relatively limited methods are known in the art to provide some assistance in estimating when certain events may occur. For example, U.S. Pat. No. 7,783,562 discloses a method for obtaining an estimated financial outcome—a gain or a loss—for a particular loan. One element of that method is a method for obtaining an estimated liquidation time—an elapsed time from a last interest-paid date to the receipt of the liquidation proceeds received from the sale of the property—using a decision tree and various set time factors.
In another example, U.S. Patent Application Publication No. 2004/0019516 discloses a method for calculating the probability that one or more automobiles will be sold by a future date. This method uses survival analysis—a well-known statistical methodology—based on historical data for the number of days that an automobile remains on a sales lot to estimate a probability that one or more automobiles will be sold by a future date.
Among other limitations, these known methods are static; they assume that a process model (such as the process for liquidations or automobile sales) is stable over time. It would be advantageous to provide systems and methods that are dynamic: that can forecast events where the underlying duration distributions change over time.
In one embodiment of the invention, a data processing method for forecasting event dates is provided. The method includes the steps of identifying a plurality of defined process events, and estimating dynamically for at least one event a duration distribution between a starting date of the event and an end date of the process. The estimated duration distribution is used for generating one or more modeling parameters used for generating one or more forecasts.
In another embodiment, the method includes the steps of identifying a plurality of defined process events, and estimating dynamically for one event a duration distribution between a first date of the one event and a second date of another event.
In another embodiment, the first date is a starting date of the one event, and the second date is an end date of the last event of the process.
In another embodiment, the one event is the same as the other event.
In another embodiment, the method further includes computing for the one event a time elapsed from the first date to a current date.
In another embodiment, the method further includes determining, based on the time elapsed, a conditional duration distribution from the first date to the second date.
In another embodiment, the method further includes selecting a measure of distributional center of the conditional duration distribution.
In another embodiment, the selected measure of distributional center is a median, a mean, a trimmed mean, or a quantile reasonably close to the mean.
In another embodiment, the method further includes associating with at least one of the one or more forecasts an uncertainty measure of the conditional distribution.
In another embodiment, the uncertainty measure is an inter-quartile range, a standard deviation, a mean absolute deviation from the selected measure of distributional center, or a range.
In another embodiment, a data processing system for forecasting event dates is provided. The system includes a memory device, and a processor device operatively connected to the memory device and configured to perform a method. The method includes the steps of identifying a plurality of defined process events, and estimating dynamically for at least one event a duration distribution between a starting date of the event and an end date of the process. The estimated duration distribution is used for generating one or more modeling parameters used for generating one or more forecasts.
In another embodiment, the method that the memory device is configured to perform further includes identifying a plurality of defined process events, and estimating dynamically for one event a duration distribution between a first date of the one event and a second date of another event.
In another embodiment, the system further includes means for computing for the one event a time elapsed from the first date to a current date.
In another embodiment, the system further includes means for determining, based on the time elapsed, a conditional duration distribution from the first date to the second date.
In another embodiment, the system further includes means for selecting a measure of distributional center of the conditional duration distribution.
In another embodiment, the system further includes means for associating with at least one of the one or more forecasts an uncertainty measure of the conditional distribution.
In another embodiment, a computer program product for forecasting event dates is provided. The computer program product includes a computer readable storage medium that is embodied with computer readable program code. The computer readable program code includes computer readable program code configured to identify a plurality of defined process events, and estimate dynamically for at least one event a duration distribution between a starting date of the event and an end date of the process. The estimated duration distribution is used for generating one or more modeling parameters used for generating one or more forecasts.
In another embodiment, the computer readable program code includes computer readable program code configured to identify a plurality of defined process events, and estimate dynamically for one event a duration distribution between a first date of the one event and a second date of another event.
The systems and methods herein described accomplish event date forecasting for a process using dynamic estimation of a statistical model of the process based on historical data of the event dates for some or all of the events. Preferably, a forecast conditions on the forecast issue date (a “current date”) and yields a measure of center for the conditional distribution of the forecasted duration between forecast issue date and the target event date. Typically, the forecasted target event date will be the date of the end event in a process, but any target event in a process may be treated as the end event, as thus any event date may be the forecasted date.
The dynamic estimation according to these systems and methods adjust for temporal dynamics (such as shifts and other changes in historical-data distributions concerning a process). In addition, such systems and methods can adjust and improve the estimation using open cases (i.e., instances of a process still in progress), and delays (such as holds and moratoria in a process).
In one embodiment, a data processing method for forecasting event dates for a process having a plurality of process steps includes:
This embodiment may be illustrated using a mortgage origination process. As shown in
As shown in
Historical event data preferably consists of start and completion dates (alternatively called “end dates” or “stop dates”) for events in a process for a number of instances of the process. In the example of the mortgage origination process, each instance of the process is a particular mortgage or loan application, which can be identified by a loan number. A particular loan that has completed all events in the process is referred to as a closed case, one example of which is shown in Table 3. A particular loan that is still in process is referred to as an open case, one example of which is shown in Table 4.
In the examples of Tables 3 and 4, the last two columns are labeled “Start Date” and “Stop Date” because it is possible that an event may begin on a first date and end on a second date. For example, Event #2 (Underwriting—Review) begins for Loan #3224 on Mar. 2, 2010 and ends on Mar. 21, 2010, and for Loan #3225 begins on Mar. 3, 2010 and ends on Mar. 24, 2010.
Whether loan data represents an open case or a closed case depends on the target date for a forecast. For example, in Table 4, if the target date is the stop date of the last step of the process, then the absence of a stop date for Event #8 indicates that Loan #3225 is an open case of the mortgage origination process. In another example, if the target date is the start date of Event #5, then the presence of a start date for that event indicates that Loan #3225 is a closed case.
Preferably, the systems and methods forecast the time from the starting date of an event or process step (StepStartDate) to the end date of the last event or step in the process (ProcessEndDate). But the systems and methods may also be used to forecast the time from the start or end date of any event to the start or end date of any later event, or from the start date of an event to the end date of that event.
It may be desirable to have the generation of model parameters in
Data for loans such as Loan #3224—in which the target event has been completed—are preferably stored in a Steps Closed Table with the following columns:
Data for loans such as Loan #3225—in which the target event has not yet been reached—are preferably stored in a Steps Open Table with the following columns:
In the Steps Open Table, “today” is the day that the forecast is considered made, which may also be referred to as a “current date.” It may be the actual date when a system or method is used to make a forecast, or a date when the forecast is considered to be made.
Data concerning holds in the process that have been completed are preferably stored in a Holds Closed Table with the following columns:
Data concerning holds in the process that have not been completed are preferably stored in a Holds Open Table with the following columns:
Again, in the Holds Open Table, “today” is the day that the forecast is considered made.
As an example of a forecast made using a system or method, say that on Apr. 19, 2010 a forecast is desired for completing the mortgage application process for Loan #3225. As shown in Table 4, as of that date the loan is currently in Event #5 because Loan #3225 has started but not yet completed Event #5. A system or method therefore preferably generates model parameters based on a duration distribution from the Start Date of Event #5 to the end of the mortgage application process (in this case the Stop Date of Event #8).
The step of dynamically estimating for each process event the duration distribution between the starting date of the process step and the end date of the process (or other target date) preferably includes defining a series of time points {t1, t2, . . . tT} and a series of data weighting functions {w1, w2, . . . wT}. The data weighting functions may or may not depend on the data availability.
For example, a series of time points based on calendar quarters could consist of the last date of each quarter (T=4 and t1=March 31, t2=June 30, t3=September 30, and t4=December 31) or an approximate midpoint of each quarter (e.g., T=4 and t1=February 15, t2=May 15, t3=August 15, and t4=November 15). Many other series of time points may be used based on various intervals (e.g., daily, weekly, monthly, quarterly, yearly, etc.) and various points within those intervals (e.g., first date of each week, first date of second week each month, midpoint, last Thursday of each quarter, last date of each week, etc.). Preferably the time points are at regular intervals (e.g., quarterly), but irregular or random intervals may also be used (e.g., T=5 and t1=March 31, t2=June 15, t3=September 1, t4=November 15, and t5=December 24).
The duration may be measured in any suitable time unit. For example, in addition to the intervals mentioned above, shorter durations (e.g., hours, minutes, seconds, etc.) or longer durations (e.g., weeks, months, years, decades, centuries, millennia) may be used. Any forecast made using, and any date used by, systems and methods described herein—including start dates, stop dates, and current dates—may be expressed in any degree of duration (e.g., March 15; Mar. 15, 2010; or Mar. 15, 2010 at 4:15:3.5 pm, meaning 4:15 plus 3.5 seconds on the afternoon of Mar. 15, 2010).
Weighting functions may be of varying types known to those skilled in the art (e.g., step functions, piecewise linear functions, kernels).
For example, as shown in
w
2=1 if t2ε[t2−D, t2+D)
w2=0 otherwise
As skilled artisans will recognize, in order to accurately estimate the duration distributions a certain number of events L for each time point may be needed. In such a case, “data dependent” time windows may be used. For example, if time t2 has less than L events in the interval [t2−D , t2+D), the time window may be increased for that time point by increasing D so that the time window covers at least L events.
Another type of weighting function that may be used is a triangular (piecewise linear) function, which may or may not overlap for different time points. When weighting functions overlap, the same event can be used for different duration distributions. For example,
Another type of weighting function that may be used is a kernel. For example,
Those of skill in the art will also recognize that the same weighting function may be used for each point in the time series (e.g., if T=4, for t1, t2, t3 and t4 the weighting functions w1, w2, w3 and w4 will be the same, but centered at t1, t2, t3 and t4, respectively), or different weighting functions—or no weighting at all—may be used for some or all of the points in the time series.
Dynamic estimation further includes modeling distributional changes over time using the previously defined series of time points and weighting functions, preferably by either:
Other methods (parametric, non-parametric, semi-parametric, etc.) of modeling distributional changes over time can also be used.
The distributions can be estimated based on complete observations only (based on data in the Steps Closed table), or using an adjustment for censoring (such as the Kaplan-Meier adjustment) to make use of both the complete incomplete observations (i.e. the data in both the Steps Closed and Steps Open table).
Additionally, the distribution can be estimated using the “raw” StepStartDate-to-ProcessEndDate durations or after removing internal and/or external delays (holds, moratoria, other delays).
The step of computing the dynamic distribution for time period p (where p is, e.g., a time period covering a time point of interest ti or the current time period) may be done, in the parametric case, using the observed or forecasted parametric model parameters for time period p (or time point ti), or in the non-parametric case, by reversing the standardization (i.e., using observed or forecasted measures of center and variability for time period p to transform the characteristic distribution into an untransfonned estimated distribution for time period p.)
The step of determining, given the time elapsed, the conditional duration distribution from the starting date of the step (StepStartDate) to the end date of the process (ProcessEndDate) preferably includes, in the parametric case, deriving the conditional distribution analytically by conditioning it on the duration being larger than the time elapsed; or, in the non-parametric case, truncating the histogram to time periods that are larger than the time elapsed.
The step of selecting a measure of distributional center preferably includes specifying the center of the conditional distribution as the median, mean, trimmed mean, any other quantile reasonably close to the mean, or other measure of distributional center.
The step of associating an uncertainty (variability) measure with the forecast preferably includes using the inter-quartile range, standard deviation, mean absolute deviation from a measure of center, range, or other measure of variability.
The number and types of processes for which event forecasting may be accomplished using the systems and methods described herein are practically unlimited. In addition to the examples discussed previously, other examples of such processes are immigration permissions (e.g., issuing a Green card), warranty or insurance claim payouts, publication of an academic journal article, the issuance of a state license to operate a certain type of business and a process termination date in any supply chain scenario that requires a sequence of steps.
As an example of forecasting in accordance with the systems and methods described herein, consider a forecast made on Apr. 19, 2010 of the completion of the last event in the mortgage origination process for Loan #3225 in Table 4. The target date to be forecast is the stop date of Event #8. From Table 4 it may be seen that the current step date is the start date of Event #5, which is Apr. 4, 2010. The current date or “today” is Apr. 19, 2010.
Of the 15 possible sets of model parameters for the end of the process—one set of parameters for each of the durations to the stop date of Event #8 from the start and stop dates of Events #1-7 and the start date of Event #8—the model parameters are generated for the duration from start date of Event #5 to the end date of Event #8. Using all the historical event data from the Steps Closed Table and the Steps Open Table, the model parameters are generated using dynamic estimation as described above that may adjust for temporal dynamics, open cases, and delays.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a system, apparatus, or device running an instruction.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device running an instruction.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the illustrations and/or block diagrams, and combinations of blocks in the illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the block diagram block or blocks.
The block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more operable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or illustration, and combinations of blocks in the block diagrams and/or illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
This application claims benefit to U.S. Provisional patent application No. 61/476,950, filed Apr. 19, 2011, the entire contents and disclosure of each of which is expressly incorporated by reference herein as if fully set forth herein.
Number | Date | Country | |
---|---|---|---|
61476950 | Apr 2011 | US |