The present application relates generally to information handling and/or electronic data processing and analytics, and more particularly to methods, computer systems, and computer program products using, for example, advanced sampling techniques and taking into consideration and/or accounting for random effects, for example, to minimize risk.
With the growth of electronic data, it is becoming increasingly important to analyze and process that electronic data. With the recent advancement of information technology and wide use of storing and processing electronic data, more and more demands are being placed on the acquisition, processing, storage, and analyzing electronic data and information by computing systems. As electronic data which is being stored has increased dramatically it is increasingly important to be able to process and analyze that electronic data efficiently.
Data analytics have shown promising results in helping financial institutions across different segments to perform risk assessment to mitigate or minimize risk. Generally, in risk assessment there are numerous and different parameters, factors, and metrics in large data sets that are analyzed and used to build advanced data analytical and/or machine learning models. Systems and techniques have been developed that use cognitive analytics to help financial institutions, e.g., banks, to detect, minimize, and/or mitigate risk. Mitigating or minimizing risk can be critical as early detection and proactive action can make a big difference in averting financial loss. Risk is often modeled as a cost (or loss) function, for example, credit risk of entities in a portfolio, investment risk, etc. The cost or loss function typically involves two components—systematic factors that affect cost/loss and random or idiosyncratic factors that affect cost/loss. Modeling and continuous improvement of the cost function is important in risk assessment. Low probability (rare) events can lead to significant losses. Modeling and data analytics that take into account random (idiosyncratic) factors, and sampling/simulation techniques that are faster and converge with a smaller sample set, would be advantageous.
The summary of the disclosure is given to aid understanding of systems, platforms, and/or techniques to perform data analytics, including machine learning and cognitive analytics, that take into account random (idiosyncratic) factors in risk assessment, and not with an intent to limit the disclosure or the invention. The present disclosure is directed to a person of ordinary skill in the art. It should be understood that various aspects and features of the disclosure may advantageously be used separately in some instances, or in combination with other aspects and features of the disclosure in other instances. Accordingly, variations and modifications may be made to the systems, platforms, tools, programming, techniques, and/or methods for performing data analytics, modeling risk, accounting for random (idiosyncratic) factors, and random (idiosyncratic) sampling techniques to achieve different effects.
A system, platform, computer program product, and/or technique according to one or more embodiments for performing data analytics is disclosed, including modeling loss function (risk), accounting for random (idiosyncratic) factors, and/or performing sampling/simulation techniques, for example, to assess, detect, minimize, and/or mitigate risk, for example in the financial industry (e.g., banking, investment, insurance, etc. fields). In one or more approaches the system, platform, tool, computer program product, and/or technique includes accounting for random factors in a loss function influenced by both systemic factors (Y) and random factors (Z). According to one or more embodiments, the system, platform, tool, computer program product, and/or method includes: computing an initial center of gravity (initial COG) of a loss function that is influenced by both systemic factors (Y) and random factors (Z); and adjusting the initial COG of the loss function toward an Origin to account for the random factors (Z). In a preferred embodiment the initial COG is computed using uniform sampling. Adjusting the initial COG to a New COG to account for the random (idiosyncratic) factors (Z) in an embodiment includes performing sampling around the initial COG and the Origin. In an alternative embodiment, a support vector machine learning technique is developed relying on boundary point simulations to further tune the new COG. In a further embodiment, the system, platform, tool, computer program product and/or method includes performing an importance sampling around the New COG.
According to a further approach, the system, platform, tool, computer program product, and/or method further includes: computing a loss distribution at the Origin; using the loss distribution computed at the Origin to estimate a maximum reachable loss at the Origin; computing a loss distribution at the initial COG; and using the loss distribution computed at the initial COG to estimate a maximum reachable loss at the initial COG. In an aspect, the system, platform, tool, computer program product, and/or method further includes: estimating the maximum reachable loss at the Origin based upon N sample points; and estimating the maximum reachable loss at the initial COG based upon N sample points, wherein N sample points is in the range of about 900 sample points to about 1100 sample points. The system, platform, tool, computer program product, and/or method according to another embodiment includes computing a distance to a New COG such that an expected maximum reachable loss of the New COG hits a user defined loss boundary. In a further approach, the system, platform, tool, computer program product, and/or method includes: estimating the maximum reachable loss at the Origin based upon N sample points; and estimating the maximum reachable loss at the initial COG based upon N sample points, wherein the expected maximum reachable loss of the New COG is targeted to reach the critical loss boundary. In an aspect, adjusting the initial COG includes computing a New COG wherein the New COG is such that a mean loss at New COG plus X standard deviations equals a user defined loss boundary, where in an embodiment, X standard deviations is between about 2 to 3 standard deviations (e.g., 2.5 sigma).
The New COG in an embodiment is determined by geometric ratios. Computing the New COG as determined by geometric ratios in an approach includes: performing a Monte Carlo sampling around an Origin to identify a Max loss at the Origin wherein the Max loss at the Origin is the mean loss at the Origin plus X standard deviations from the mean loss at the Origin and wherein the mean loss at the Origin and the standard deviation at the Origin are determined by Monte Carlo simulations; and performing a Monte Carlo sampling around the Initial COG to identify a Max loss at the Initial COG wherein the Max loss at the Initial COG is the mean loss at the Initial COG plus X standard deviations from the mean loss at the Initial COG and wherein the mean loss at the initial COG and the standard deviation at the initial COG are determined by Monte Carlo simulations. In an aspect, X standard deviations is in the range of about 2 standard deviations to about 3 standard deviations, preferably about 2.5 standard deviations. The system, platform, tool, computer program product, and/or method further includes computing a distance to the New COG from the Initial COG, wherein the distance to the New COG from the Initial COG=((the Initial COG)−(the Origin))*(((the Max loss at the Initial COG)−(the user defined Loss Boundary))/((the Max loss at the Initial COG)−(the Max loss at the Origin))). Computing New COG in an approach is the Initial COG minus the distance to the New COG from the Initial COG. The system, platform, tool, computer program product, and/or method further includes in an embodiment performing an importance sampling about the New COG.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings.
The various aspects, features, and embodiments of methods, techniques, products, instruction programming, platforms, tools, and/or systems for performing electronic data analytics, including modeling loss function, improving sampling/simulation techniques, and/or accounting for random (idiosyncratic) factors in the loss function, will be better understood when read in conjunction with the figures provided. It may be noted that a numbered element in the figures is typically numbered according to the figure in which the element is introduced, is typically referred to by that number throughout succeeding figures, and that like reference numbers generally represent like parts of exemplary embodiments of the invention.
Embodiments are provided in the figures for the purpose of illustrating aspects, features, and/or various embodiments of the methods, techniques, products, programming, platforms, tools and/or systems for performing data analytics, for example to minimize and/or mitigate the loss/risk function (for example in the financial industry), including modeling the loss/risk function, improving sampling/simulation techniques, and/or accounting for random (idiosyncratic) factors, but the claims should not be limited to the precise arrangement, structures, features, aspects, assemblies, subassemblies, systems, platforms, circuitry, functional units, programming, instructions, embodiments, methods, processes, or devices shown. The arrangements, structures, features, aspects, assemblies, subassemblies, systems, platforms, circuitry, functional units, programming, instructions, embodiments, methods, processes, and/or devices shown may be used singularly or in combination with other arrangements, structures, features, aspects, assemblies, subassemblies, systems, circuitry, functional units, programming, instructions, methods, processes, and/or devices.
The following description is made for illustrating the general principles of the invention and is not meant to limit the inventive concepts claimed herein. In the following detailed description, numerous details are set forth in order to provide an understanding of methods, techniques, programming products, platforms, tools, and systems for performing data analytics, including modeling the loss/risk/cost function, improved sampling/simulation techniques, and/or accounting for random (idiosyncratic) factors in the loss/risk/cost function (two factor function), however, it will be understood by those skilled in the art that different and numerous embodiments of the methods, techniques, programming products, platforms, tools, and/or systems may be practiced without those specific details, and the claims and disclosure should not be limited to the arrangements, embodiments, features, aspects, systems, assemblies, subassemblies, structures, functional units, circuitry, programming, instructions, processes, methods, or details specifically described and shown herein. In addition, features described herein can be used in combination with other described features in each of the various possible combinations and permutations.
Unless otherwise specifically defined herein, all terms are to be given their broadest possible interpretation including meanings implied from the specification as well as meanings understood by those skilled in the art and/or as defined in dictionaries, treatises, etc. It should also be noted that, as used in the specification and the appended claims, the singular forms “a”, “an” and “the” include plural referents unless otherwise specified, and that the terms “includes”, “comprises”, and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The following discussion omits or only briefly describes performing electronic data analytics, machine learning (ML) models, deep learning, cognitive analytics, loss (risk/cost) modeling, and/or sampling/simulation techniques, which are apparent to those skilled in the art. It is assumed that those skilled in the art are familiar with performing electronic data analytics, machine learning (ML) models, cognitive analytics, loss (risk/cost) modeling, and/or electronic data sampling/simulation techniques.
As an overview, a cognitive system is a specialized computer system, or set of computer systems, configured with hardware and/or software logic (in combination with hardware logic upon which the software executes) to perform electronic data analytics and has the ability to emulate human cognitive functions. These cognitive systems apply convey and manipulate electronic data at various levels of interpretation which, when combined with the inherent strengths of digital computing, can solve problems with high accuracy and resilience on a large scale. IBM Watson™ is an example of one such cognitive system which can process human readable language and identify inferences between text passages with human-like accuracy at speeds far faster than human beings and on a much larger scale. In general, such cognitive systems are able to perform the following functions:
Disclosed is a system, platform, tool, computer program product, and/or process for performing data analytics, for example to access and/or minimize a function that accounts for systemic factors (Y) and random (e.g., idiosyncratic) factors (Z), for example a loss/risk/cost function in the financial services context. In one or more embodiments, based on a cost/loss/risk function, the system, platform, tool, computer program product, and/or process identifies systematic factors (Y) and accounts for the low probability (rare) events, e.g., the random (idiosyncratic) factors (Z), associated with the function (e.g., the cost/loss/risk function). In one or more approaches, the system, platform, tool, computer program product, and/or technique applies an importance sampling/simulation technique incorporating both systematic and with random (idiosyncratic) effect considerations (e.g., systemic factors (Y) and random factors (Z) to tune the function (e.g., the cost/loss/risk function)) that is faster and converges with a smaller sample set. In a further aspect, the system, platform, tool, computer program product, and/or technique applies a first uniform sampling/simulation technique followed by a sliding center of gravity (COG) with important sampling/simulation technique (or machine learning (ML) techniques) to further speedup and provide accurate estimates of rare fail events that account for and/or predict random idiosyncratic factors (Z). The system, platform, tool, computer program product, and/or technique has application in functions that include systemic factors (Y) and random factors (Z), including for example the loss/risk/cost function in the financial services context as well as other contexts and environments. That is, while the disclosure describes the function in terms of the loss/cost/risk function in the financial services field, including portfolio credit risk and investment risk, it should be understood that the disclosure has application to and in other fields, environments, and/or two factor functions that have both systemic factors (Y) and random factors (Z).
The disclosure pertains to loss, cost, and/or risk functions that are influenced by both systemic factors (Y) and random factors (Z), where the random (e.g., low probability) events are taken into account to reduce optimism. The disclosure has application in the financial services context and other contexts and will be described in the financial services context. Financial risk, e.g., portfolio credit risk, investment risk, and other similar problems, invoke rare event simulation. That is, accessing financial risk, and other similar problems, need to account for rare or random (idiosyncratic) events. Credit risk management for example is often a rare event simulation problem because default probabilities are low for highly rated obligors and risk management is particularly concerned with rare but significant loss events resulting from a large number of defaults.
In a specific example, given a portfolio (P) containing groups (G) of counterparties (CP) based on their industry or country the objective is to minimize loss (L) by assigning weights/positions (X) for the groups (G) within the portfolio (P). The stochastic optimization is sampling-based and at its core are loss expectation calculations and rare probability estimations. To further explain, as an example, a loss incurred by a counterparty is a function of its credit state, which is a function of economic factors including systematic factors (Y) and random (idiosyncratic) factors (Z). Systematic factors (Y) include macroeconomic factors and/or credit drivers while random factors (Z) include counter-party specific factors, or idiosyncratic variables. The Loss equation, which is a function of creditworthiness or credit state (c) consisting of systemic factors (Y), random factors (Z) and counter party (j):
While the disclosure is described in the context of the loss equation in the financial services field it can be appreciated that the system, platform, tool, and/or techniques disclosed have application to other fields and environments where an equation is a function of both systemic factors (Y) and random factors (Z), including loss, cost, and/or risk functions.
The purpose of process 100 is to guide risk decisions that affect the risk, loss and/or cost function and/or to generate an alert when the risk, loss, and/or cost function breaks a defined threshold or channel, and includes accounting for random (idiosyncratic) factors and/or in an aspect can include applying an importance sampling technique with idiosyncratic effect considerations to tune the risk, cost and/or loss function so that it is faster and converges with a smaller sample set. Given a particular environment or context, e.g., investment risk, as a pre-configuration step for example, at 110 the systematic factors (Y) and random factors (Z) associated with a function (e.g., a loss, cost, or risk function) are identified and/or determined. In an embodiment, a knowledge base can be used to determine the factors (systematic and random factors (Y, Z)) that influence the function, e.g., a loss, cost, and/or risk function. Systematic factors are generally global (e.g., market risk) and determining the factors that influence the cost/loss/risk function is beyond the scope of this disclosure.
At 120, cost/loss/risk function optimization techniques are invoked to tune the cost, loss, and/or risk function, and in an approach, account for random (idiosyncratic) effect considerations and factors (Z). According to an embodiment, as detailed herein, an importance sampling technique with idiosyncratic effect considerations can be applied to tune and optimize the cost, loss and/or risk function, where the importance sampling is faster and converges with a smaller sample set than other techniques, such as, for example, Monte Carlo simulations. Using machine learning (ML) models and techniques, the system, platform, tool, programming product and/or process 100 at 130 monitors the cost, risk, and/or loss function. In a further approach, using machine learning (ML) models and techniques, the system, platform, tool, programming product and/or process 100 at 130 continuously assigns values to the factors (systematic and random factors (Y, Z)).
At 140, if the cost, loss, and/or risk function breaks a threshold, boundary, and/or or channel (e.g., a threshold value or range of values), the system, platform, tool, programming product, and/or method 100 generates an alert. The threshold can be a predetermined value based upon a user defined loss, cost, and/or risk value. The threshold can be predetermined, pre-set, fixed, adjustable, programmable, and/or machine learned. This predefined loss, cost, and/or risk value is often referred to as the fail boundary or performance metric fail criteria. The threshold or boundary is also referred to as the loss or fail threshold and/or the loss boundary. With the alert, early detection can occur so that proactive and/or corrective action can be taken to restore the risk/cost/loss function to be below the threshold and/or within the channel (e.g., the fail or loss boundary).
As shown in the diagram of
The systematic factors (Y) in the cost, loss, and/or risk function, in reality, do not take on, or are not, a discrete, singular value, but are a continuous range (e.g., a distribution) of Y values, and so we can think of Y as a small bin of Y values. The random factors (Z) will have a spread effect on this small bin of Y values.
As shown in
At 510 the initial center of gravity of fails, referred to as initial COG is computed, calculated, determined, identified, approximated, estimated, and/or discovered. The center of gravity of fails (COG) is the center point of a distribution of failures, e.g., when the risk, loss, and/or cost probability exceeds a failure boundary. The failure or loss boundary, also referred to as the fail or loss threshold, or as performance metric fail criteria, is generally a user defined loss, cost, or risk failure value. The failure boundary can be predetermined, pre-set, fixed, adjustable, programmable, and/or machine learned. As indicated above, as the number of samples increases, the Loss (cost or risk) spread also increases. In an approach, to facilitate determining the initial COG, a uniform sampling technique is employed. The uniform sampling technique assumes a uniform loss (cost or risk) spread about the initial COG. The initial COG is obtained according to an approach by running uniform sampling simulations and finding the mean or center of the failing sample points. The objective of step 510 is compute, determine, and/or identify the center for the importance sampling distribution and the uniform sampling technique is one method to compute, calculate, determine, identify, approximate, estimate and/or discover the initial COG, however, other techniques can be employed to compute and/or estimate initial COG including more rigorous sampling techniques, but such techniques could be exhaustive and take longer.
In an embodiment, at 520, the initial COG is moved or shifted closer to the origin, and in a particular approach is slide or moved to a New COG so that in an aspect the New COG, for a given value of Y (or bin of Y values) does not hit and/or cross the fail boundary. In an approach the initial COG is moved to account for the spread around a given bin of Y (systemic factor) values that will hit the boundary, due to the effect of random factors (Z), at a smaller Y value.
At 530, sampling is performed around the New COG to tune and optimize the loss, cost, and/or risk function. In an embodiment, an importance sampling is performed around the New COG region. Sampling is performed around the New COG to estimate rare fail probabilities at the tail of the loss distribution to guide the optimization of the cost/loss/risk function. Importance sampling is used to provide a faster calculation of the rare event estimation, and that converges using a smaller sample set. Importance sampling distorts the natural distribution to prioritize the sampling to focus on the “most important” regions of the function, e.g., the loss, cost, and/or risk function. Importance sampling technique is a variation reduction based technique that is faster and converges with a smaller sample set than present techniques that focus on traditional Monte Carlo techniques that need a large sample size in order to capture a reasonable number of fails in response to estimating rare fail probabilities. The importance sampling technique is described in Kanj, Rouwaida et al., “Mixture Importance Sampling and its Application to the Analysis of SRAM designs in the Presence of Rare Failure Events”, 2006 43rd ACM/IEEE Design Automation Conference, IEEE, 2006 and Glasserman and Li, “Importance Sampling for Portfolio Credit Risk”, Management Science 51.11 (2005), pp. 1643-1656, the entirety of both are incorporated by reference herein.
The importance sampling applied to systemic factors (Y) still embeds idiosyncratic random factor (Z) values and provides an approximation of the loss, cost, and/or risk spread. Modeling and continuous improvement of the cost, loss, and/or risk function is important. Low probability (rare) events can lead to significant losses. An importance sampling technique incorporating systemic factors (Y) and random idiosyncratic factors (Z) is used in an approach to predict the probabilities of rare but large losses, costs, and/or risk and tune the cost, loss and/or risk function at 530. In an arrangement, as outlined in process 500 of
In process 700, at 710 the loss distribution (histogram or probability density function) or the loss distribution moments at the Origin are computed. In an embodiment, at 710 the maximum reachable loss at the Origin is estimated. In a further aspect, the maximum reachable loss at the Origin can be calculated and/or determined based upon N sample points of random factors (Z) around the Origin. The N sample points are chosen so that the number of sample points is neither too small nor too large. If the sample size (e.g., the number N sample points) is too large the simulations will be slowed down and if the sample size is too small then an erroneous New COG will be calculated at 730. In an embodiment, the sample size is in a range of about 800 samples to about 1200 samples, more preferably about 900 samples to about 1100 samples, and in an embodiment, the maximum reachable loss due to random idiosyncratic factors (Z) at the Origin is calculated and/or estimated using about 1000 samples at 710. It can be appreciated that other sample sizes can be used to calculate, compute, and/or estimate the maximum reachable loss at the Origin at 710.
In process 700, at 720 the loss distribution (Cumulative Distribution Function (CDF)) is computed at the Initial COG. In an embodiment, at 720 the maximum reachable loss at the initial COG is estimated. The maximum reachable loss at the initial COG due to random (idiosyncratic) factors (Z) can be calculated, determined, and/or estimated at 720 based upon N sample points of random (idiosyncratic) factors (Z) around the Initial COG. The N sample points are chosen so that the number of sample points is neither too small nor too large. If the sample size (e.g., the number N sample points) is too large the simulations will be slowed down and if the sample size is too small then an erroneous New COG will be calculated at 730. The sample size (e.g., the number of samples N) chosen for estimated the initial COG is typically the same sample size (e.g., same number of samples N) as used at 710 to estimate the loss distribution at the origin. In an embodiment, the sample size is in the range of about 800 samples to about 1200 samples, more preferably about 900 samples to about 1100 samples, and in an embodiment, the maximum reachable loss at the initial COG is estimated using about 1000 samples at 720. It can be appreciated that other sample sizes can be used to calculate, compute, and/or estimate the maximum reachable loss at the initial COG at 720.
At 730 compute distance to New COG, such that the maximum reachable loss distribution of the New COG hits (or is within (e.g., less than)) the fail boundary. In an embodiment, at 730 the distance to the New COG, such that the loss distribution of the New COG is at (or within) the fail boundary is computed. In an approach, the distance to the New COG such that the loss distribution of the New COG is at (or within) the fail boundary is computed using geometric ratios. In an embodiment, the ratio of “the distance of the Origin to the New COG” to “the distance of the Origin to the Initial COG” is proportional to the ratio of the “difference between the maximum reachable loss at the Origin and the loss at the fail boundary” to “the difference between the maximum reachable loss at the Origin and the maximum reachable loss at the initial COG”.
In reality, systemic factors (Y) as explained above are not discrete singular values but are a continuous distribution of values, and in addition the random (idiosyncratic) factors (Z) will have a spread effect on the distribution of the loss, cost, and/or risk function. There is a need to take into consideration the spread effect on the loss, cost and/or risk function. In an aspect, the systemic factor value can be thought of as a small enough bin of Y values (e.g., a continuous range of Y values in a bin). In an approach the fact that the spread around a given bin of systemic factor (Y) values will hit the fail boundary due to random factors (Z) at a smaller value of Y is taken into account. Assuming 1000 Monte Carlo samples are divided into 10 bins and 240 samples are in the central bin, then an approximate 2.5 sigma spread in loss around the central bin can be achieved. This can be used to estimate the random (idiosyncratic) factors/effects (Z). If we assume 240 samples at Y equal approximately 0 (y˜=0), the result is the loss variation due to random factors (Z):
Lossmax@origin=μloss @origin+2.5*σloss@origin
This represents one embodiment. That is, the maximum reachable loss at the Origin that will be used in this embodiment to slide the COG is the mean value of the loss at the origin plus 2.5 standard deviations at the Origin (e.g., 2.5*the standard deviation of the loss distribution obtained at the Origin or 2.5 sigma). Similarly, if we assume 240 samples at Y at the initial COG, the maximum loss variation at Initial COG due to random factors (Z) is:
Lossmax @COG=μloss @COG+2.5*σloss @COG
That is, the maximum loss at the Initial COG is the mean value of the loss at the initial COG plus 2.5 standard deviations at the Initial COG (e.g., 2.5*the standard deviation at the Initial COG or 2.5 sigma). And the maximum loss at the New COG is targeted to be the (predefined) loss threshold or boundary:
Lossmax@newCOG=loss threshold.
In an embodiment, process 800, to adjust or slide the initial COG to reduce optimism and account for random (e.g., idiosyncratic) effects (Z), includes at 810 performing Monte Carlo sampling around the Origin to determine a spread of losses (costs or risks) from which a point that represents the mean loss (cost or risk) at the Origin plus X sigma (e.g., X standard deviations) can be identified. The Origin is the nominal value for the systematic factor or variable (or set of factors or variables) Y. Monte Carlo sampling can be used to determine the mean loss and the standard deviation at the Origin, from which the point X sigma (e.g., X standard deviations) from the mean loss around the Origin can be determined. This point is represented as point “c” in
The value of X sigma (or X standard deviations) can be chosen to provide for and account for the random factors (Z) and the spread effect of such random factors (Z) on the cost, loss, and/or risk function. In one or more embodiments, the fail boundary or loss threshold is placed in the range of about 2 sigma (e.g., 2 standard deviations) to about 3.0 sigma (e.g., 3 standard deviations) from the mean loss at the Origin, and more preferably at about 2.5 sigma from the mean loss at the Origin as discussed above, and Monte Carlo simulations are performed around the Origin to identify the mean loss (cost or risk) at the Origin and the standard deviation (from which 2.5 sigma (e.g., 2.5 standard deviations) from the mean can be determined). It can be appreciated that X sigma can be set at 2.5 sigma but other values for X can be used, for example, 2 sigma, 2.7 sigma, 3 sigma, etc., as a matter of design choice. The number X of standard deviations (e.g., X sigma) from the mean is chosen in an aspect to account for the spread due to the sample size used in performing the Monte Carlo sampling at 810. Process 810 of performing Monte Carlo sampling around the Origin to locate the mean loss (cost or risk) about the Origin plus X sigma (e.g., 2.5 sigma or 2.5 standard deviations) from the mean loss value at the Origin identifies point “c” in
Process 800 to adjust or slide the Initial COG to reduce optimism and account for random (e.g., idiosyncratic) effects continues at 820 where Monte Carlo sampling is performed around the Initial COG to determine a spread of losses from which a point that represents the mean loss at Initial COG plus X sigma (e.g., X standard deviations) can be identified. That is, Monte Carlo sampling can be used to determine the mean loss and the standard deviation at Initial COG, from which the point X sigma (e.g., X standard deviations) from the mean loss around Initial COG can be determined. The value of X sigma can be chosen to provide for and account for the random factors (Z) in the cost function. In one or more embodiments, the fail boundary or loss threshold is placed in the range of about 2 sigma (e.g., 2 standard deviations) to about 3.0 sigma (e.g., 3 standard deviations) from the mean loss at the initial COG, and more preferably at about 2.5 sigma from the mean loss at the initial COG. Monte Carlo simulations in an embodiment are performed around the Initial COG to identify the mean loss at Initial COG plus 2.5 sigma (e.g., 2.5 standard deviations). It can be appreciated that X sigma can be 2.5 sigma but other values for X can be used, for example, 2 sigma, 2.7 sigma, 3 sigma, etc., as a matter of design choice. The number X of standard deviations (e.g., X sigma) is chosen in an aspect to account for the spread due to the sample size used in performing the Monte Carlo sampling at 820. Process 820 of performing Monte Carlo sampling around the Initial COG to locate the mean loss around Initial COG plus X sigma (e.g., 2.5 sigma or 2.5 standard deviations) from the mean loss value at the Initial COG identifies point “a” in
Process continues to 830 where the New COG is calculated. In an embodiment, at 830 New COG is determined such that the mean loss about New COG shifted X sigma (e.g., 2.5 sigma or 2.5 standard deviations from the mean loss at the New COG) equals the critical loss value (e.g., point “b”). One manner of calculating and/or determining New COG is by geometric ratios, and the New COG can be referred to as ratioed COG. The use of geometric ratios is illustrated with the assistance of
Importance sampling in an embodiment is performed about the New COG computed and/or estimated at 830 to tune and optimize the loss, cost, and/or risk function. That is, the Initial COG alone could be optimistic and underestimate the probability of a fail event. Simulations around the Initial COG and Origin are used to determine preferably using geometric ratios the New COG, which will identify systematic factor (Y) values that are closer to the Origin and that reach the fail boundary due to random idiosyncratic factors or effects (Z). Performing an importance sampling with the new distribution centered around New COG captures more probable events (that are still rare) compared to the Initial COG. The probability of fail estimate can then be obtained by relying on weights that are used to unbias the estimates similar to Kanj, Rouwaida et al., “Mixture Importance Sampling and its Application to the Analysis of SRAM designs in the Presence of Rare Failure Events”, 2006 43rd ACM/IEEE Design Automation Conference, IEEE, 2006. Thus, for a given importance sample point generated by the importance sampling distribution around the New COG, the corresponding weight is proportional to the ratio of the “pdf value of the point in the natural distribution” to the “pdf value of the point in the importance sampling distribution.”
It will be understood that one or more blocks of the flowchart illustrations in
Accordingly, blocks of the flowchart illustration support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustration, and combinations of blocks in the flowchart illustration, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
In some embodiments, the computer device and/or system 1000 may be described in the general context of computer system executable instructions, embodied as program modules stored in memory 1012, being executed by the computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks and/or implement particular input data and/or data types in accordance with the present invention.
The components of the computer system 1000 may include, but are not limited to, one or more processors or processing units 1010, a memory 1012, and a bus 1015 that operably couples various system components, including memory 1012 to processor 1010. In some embodiments, the processor 1010, which is also referred to as a central processing unit (CPU) or microprocessor, may execute one or more programs or modules 1008 that are loaded from memory 1012 to local memory 1011, where the program module(s) embody software (program instructions) that cause the processor to perform one or more operations. In some embodiments, module 1008 may be programmed into the integrated circuits of the processor 1010, loaded from memory 1012, storage device 1014, network 1018 and/or combinations thereof to local memory.
The processor (or CPU) 1010 can include various functional units, registers, buffers, execution units, caches, memories, and other units formed by integrated circuitry, and may operate according to reduced instruction set computing (“RISC”) techniques. The processor 1010 processes data according to processor cycles, synchronized, in some aspects, to an internal clock (not shown). Bus 1015 may represent one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus. The computer device and/or system 1000 may include a variety of computer system readable media, including non-transitory readable media. Such media may be any available media that is accessible by the computer system, and it may include both volatile and non-volatile media, removable and non-removable media.
Memory 1012 (sometimes referred to as system or main memory) can include computer readable media in the form of volatile memory, such as random-access memory (RAM), cache memory and/or other forms. Computer system 1000 can further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 1014 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (e.g., a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 1015 by one or more data media interfaces.
The computer system may also communicate with one or more external devices 1002 such as a keyboard, track ball, mouse, microphone, speaker, a pointing device, a display 1004, etc.; one or more devices that enable a user to interact with the computer system; and/or any devices (e.g., network card, modem, etc.) that enable the computer system to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 1006. Communications or network adapter 1016 interconnects bus 1015 with an outside network 1018 enabling the data processing system 1000 to communicate with other such systems. Additionally, an operating system such as, for example, AIX (“AIX” is a trademark of the IBM Corporation) can be used to coordinate the functions of the various components shown in
The computer system 1000 can communicate with one or more networks 1018 such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 1016. As depicted, network adapter 1016 communicates with the other components of computer system via bus 1015. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with the computer system. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk-drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
Platform and/or tool 1300 can include a cloud-based server, and can include one or more hardware processors 1310A, 1310B (also referred to as central processing units (CPUs)), a memory 1312, e.g., for storing an operating system, application program interfaces (APIs) and programs, a network interface 1316, a display device 1304, an input device 1302, and any other features common to a computing device, including a server. Further, as part of platform 1300, there is provided a local memory 1311 and/or an attached memory storage device (not shown).
In one or more aspects, platform 1300 may, for example, be any computing device that is configured to communicate with one or more web-based or cloud-based computing devices 1000 over a public or private communications network 1318. For instance, client user devices 1000 can communicate with platform 1300 where client user devices can include processing resources 1010 and memory 1012 that includes databases 1012A and 1012B.
In the embodiment depicted in
Network interface 1316 is configured to transmit and receive data or information to and from platform 1300, e.g., via wired or wireless connections. For example, network interface 1316 may utilize wireless technologies and communication protocols such as Bluetooth®, WIFI (e.g., 802.11a/b/g/n), cellular networks (e.g., CDMA, GSM, M2M, and 3G/4G/4G LTE, 5G), near-field communications systems, satellite communications, via a local area network (LAN), via a wide area network (WAN), or any other form of communication that allows computing device 1000 to transmit information to or receive information from platform 1300.
Display 1304 may include, for example, a computer monitor, television, smart television, a display screen integrated into a personal computing device such as, for example, laptops, smart phones, smart watches, virtual reality headsets, smart wearable devices, or any other mechanism for displaying information to a user. In one or more aspects, display 1304 may include a liquid crystal display (LCD), an e-paper/e-ink display, an organic LED (OLED) display, or other similar display technologies. In one or more aspects, display 1304 may be touch-sensitive and may also function as an input device. Input device 1302 may include, for example, a keyboard, a mouse, a touch-sensitive display, a keypad, a microphone, a camera, or other similar input devices or any other input devices that may be used alone or together to provide a user with the capability to interact with the platform 1300.
Memory 1312 may include, for example, non-transitory computer readable media in the form of volatile memory, such as random-access memory (RAM) and/or cache memory or others. Memory 1312 may include, for example, other removable/non-removable, volatile/non-volatile storage media. By way of non-limiting examples only, memory 1312 may include a portable computer diskette, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Memory 1312 of platform 1300 stores one or more modules that include, for example, programmed instructions adapted to model the cost/risk function, optimize the cost/risk function, perform sampling and simulation, and in an approach account for random factors and variables, for example, rare events, in the cost/risk function. In one embodiment, one of the programmed processing modules stored at the associated memory 1312 includes a data ingestion module 1330 that provides instructions and logic for operating circuitry to access/read large amounts of data (e.g., financial transactions, party data, financial news, etc.) for use by other modules that process and analyze the data to model the cost/risk function, optimize the cost/risk function, perform sampling and simulation, and in an approach account for random factors and variables, for example, rare events, in the cost/risk function.
In one or more embodiments, system, or platform 1300, e.g., memory 1312 contains Risk Mitigation Module 1150, which contains modules Data & Artificial Intelligence Module 1252, Event Monitoring Module 1254, Cost Function Module 1256, and Rare Events Simulation 1258. It can be appreciated that portions of the Risk Mitigation Module 1150 can be distributed throughout platform 1300. For example, the data for use by the Risk Mitigation Module can be stored outside Risk Mitigation Module 1150 and can be distributed throughout or in locations within Platform 1300. Similarly, the artificial intelligence utilized by the Risk Mitigation Module can reside within Risk Mitigation Module 1150, can be contained within a separate Machine Learning (ML) Module 1352, or be distributed throughout the Platform 1300.
Platform 1300 optionally includes a supervisory program having instructions and logic for configuring the processors 1310, including the servers to call one or more, and in an embodiment all, of the program modules and invoke the operations of system/platform 1300. In an embodiment, such supervisory program calls provide application program interfaces (APIs) for running the programs. At least one application program interface (API) 1390 is invoked in an embodiment to receive input data, e.g., instructions, for example, the performance metric fail criteria or fail boundary. The system 1300 in an embodiment produces an alert that indicates when the cost/risk function exceeds a threshold.
In one or more embodiments, platform 1300 can be a distributed computing system, for example using cloud computing capabilities and/or features. Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be provisioned and released through a service provider or vendor. This model can include one or more characteristics, one or more service models, and one or more deployment models. Characteristics can include, for example, on-demand service; broad network access; resource pooling; rapid elasticity; and/or measured services. Service models can include, for example, software as a Service (SaaS), Platform as a Service (PaaS), and/or Infrastructure as a Service (IaaS). Deployment models can include, for example, private cloud; community cloud; public cloud; and/or hybrid cloud. A cloud computing environment is typically service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. Typically, at the heart of cloud computing is an infrastructure that includes a network of interconnected nodes. Platform 1300 can take advantage of cloud computing to protect sensitive data when subject to a processing chain by one or more computing resources or nodes.
Referring now to
Referring to
Hardware and software layer 60 includes hardware and software components. Examples of hardware components can include: mainframes 61; RISC (Reduced Instruction Set Computer) architecture based servers 62; servers 63; blade servers 64; storage devices 65; and network and networking components 66. In some embodiments, software components include network application server software 67 and database software 68.
Virtualization layer 70 provides an abstraction layer from which the flowing examples of virtual entities may be provided: virtual servers 71; virtual storage 72; virtual networks 73, including virtual private networks; virtual applications and virtual operating systems 74; and virtual clients 76.
In one example, management layer 80 may provide the functions described below. Resource provisioning 81 provides procurement, preferably dynamic procurement, of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering and Pricing 82 provides cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. User portal 83 provides access to the cloud computing environment for consumers and system administrators. Service level management 84 provides cloud computing resource allocation and management such that required service levels are met. Service Level Agreement (SLA) planning and fulfillment 85 provides pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
Workload layer 90 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation 91; software development and lifecycle management 92; virtual classroom education delivery 93; data analytics processing 94; and transaction processing 95. Other functionality as illustrated by workload layer 96 is contemplated.
One or more embodiments of the present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Moreover, a system according to various embodiments may include a processor and logic integrated with and/or executable by the processor, the logic being configured to perform one or more of the process steps recited herein. By integrated with, what is meant is that the processor has logic embedded therewith as hardware logic, such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), etc. By executable by the processor, what is meant is that the logic is hardware logic; software logic such as firmware, part of an operating system, part of an application program; etc., or some combination of hardware and software logic that is accessible by the processor and configured to cause the processor to perform some functionality upon execution by the processor. Software logic may be stored on local and/or remote memory of any memory type, as known in the art. Any processor known in the art may be used, such as a software processor module and/or a hardware processor such as an ASIC, a FPGA, a central processing unit (CPU), an integrated circuit (IC), a graphics processing unit (GPU), etc.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. The embodiments and examples were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
The programs described herein are identified based upon the application for which they are implemented in a specific embodiment of the disclosure. However, it should be appreciated that any particular program nomenclature herein is used merely for convenience, and thus the disclosure should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
It will be clear that the various features of the foregoing systems and/or methodologies may be combined in any way, creating a plurality of combinations from the descriptions presented above.
It will be further appreciated that embodiments of the present disclosure may be provided in the form of a service deployed on behalf of a customer to offer service on demand.
The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.