Not Applicable
Traditional machine learning based regression models require the application of truth set data (“actuals”) for training the model and increasing predictive accuracy. However, in some environments, a technical, regulatory, or legal constraint may impose a firewall or otherwise prohibit the availability of actuals for training the model. As a result, there are many areas where regression models would be useful but cannot be taken advantage of by conventional methods.
The present disclosure contemplates various systems and methods for overcoming the above drawbacks accompanying the related art. One aspect of the embodiments of the present disclosure is a method of training a machine learning regression model. The method may comprise defining a prediction accuracy grading function, the prediction accuracy grading function being a many-to-one function that maps prediction accuracies to proxies, each of the prediction accuracies being derivable from a respective prediction of the model and a corresponding actual. The method may further comprise receiving a plurality of proxies corresponding respectively to a plurality of predictions of the model and, for each of the plurality of proxies, deriving a corresponding approximated actual according to the prediction accuracy grading function. The method may further comprise, for each of the plurality of predictions of the model, calculating an approximated residual based on the corresponding approximated actual. The method may further comprise adjusting the model based on the approximated residuals.
Another aspect of the embodiments of the present disclosure is a computer program product comprising one or more non-transitory program storage media on which are stored instructions executable by one or more processors or programmable circuits to perform operations for training a machine learning regression model. The operations may comprise defining a prediction accuracy grading function, the prediction accuracy grading function being a many-to-one function that maps prediction accuracies to proxies, each of the prediction accuracies being derivable from a respective prediction of the model and a corresponding actual. The operations may further comprise receiving a plurality of proxies corresponding respectively to a plurality of predictions of the model and, for each of the plurality of proxies, deriving a corresponding approximated actual according to the prediction accuracy grading function. The operations may further comprise, for each of the plurality of predictions of the model, calculating an approximated residual based on the corresponding approximated actual. The operations may further comprise adjusting the model based on the approximated residuals.
Another aspect of the embodiments of the present disclosure is a system for training a machine learning regression model. The system may comprise one or more databases for storing a prediction accuracy grading function, the prediction accuracy grading function being a many-to-one function that maps prediction accuracies to proxies, each of the prediction accuracies being derivable from a respective prediction of the model and a corresponding actual. The system may further comprise one or more computers operable to receive a plurality of proxies corresponding respectively to a plurality of predictions of the model and, for each of the plurality of proxies, derive a corresponding approximated actual according to the prediction accuracy grading function. The one or more computers may be further operable to calculate an approximated residual for each of the plurality of predictions of the model based on the corresponding approximated actual and to adjust the model based on the approximated residuals.
The system may further comprise one or more remote computers operable to receive the plurality of predictions of the model and a corresponding plurality of actuals, derive prediction accuracies from the predictions and the actuals, and map the prediction accuracies to proxies according to the prediction accuracy grading function to generate the plurality of proxies.
These and other features and advantages of the various embodiments disclosed herein will be better understood with respect to the following description and drawings, in which like numbers refer to like parts throughout, and in which:
The present disclosure encompasses various embodiments of systems and methods for training a machine learning based regression model, especially under circumstances in which truth set data is unavailable. The detailed description set forth below in connection with the appended drawings is intended as a description of several currently contemplated embodiments and is not intended to represent the only form in which the disclosed subject matter may be developed or utilized. The description sets forth the functions and features in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions may be accomplished by different embodiments that are also intended to be encompassed within the scope of the present disclosure. It is further understood that the use of relational terms such as first and second and the like are used solely to distinguish one from another entity without necessarily requiring or implying any actual such relationship or order between such entities.
Referring to
In the example of Table 1, the proxies output by the prediction accuracy grading function are in the form of letter grades A to I with “better” grades (i.e., closer to A) representing better prediction accuracy and with the positive/negative indicators +/− denoting whether the prediction was too high or two low relative to the actual (i.e., whether the percentage difference was positive or negative). As can be seen, the size of each bucket need not necessarily be the same. The prediction accuracy grading function may be a piecewise function that simply maps arbitrarily defined ranges of prediction accuracy to proxies as shown. Referring back to
The operational flow of
The operational flow of
Referring back to
In general, it can be appreciated that the firewall constraints on the transmission of actuals may result in one or more of the following constraints on the proxy values:
Aspects of the techniques described herein may be represented by the multi-step process illustrated in
In general, it is contemplated that the definition of the prediction accuracy grading function, including, for example, the assignment of prediction accuracy values to buckets, may be optimized in view of the particular constraints of a given deployment of the disclosed system 100. For instance, in a case where legal, regulatory, and/or technical constraints mandate a maximum accuracy (e.g., accuracy of no greater than 90% or +/−5% i.e. x=0.05), it may be critical that the grading function have low resolution for predictions that are near the actuals, (e.g., within 5% of the actuals). This may be reflected in the choice of larger bucket sizes for more accurate predictions. Referring to Table 1, for example, such a +/−5% constraint is met by the A− and A+ ranges of −0.050 to 0.000 and 0.000 to 0.050, respectively, assuming that directionality (i.e., positive/negative indicators like “+” and “−”) is allowed by the constraints. Meanwhile, for predictions that are far from the actuals, such as the H and I grades, it may be permissible for the resolution to be much greater, allowing for smaller buckets (e.g., −0.999 to −0.990 for I− and 0.990 to 0.999 for I+). From the perspective of the entity that is interested in protecting the data, the high resolution of these smaller buckets may be of no concern. On the other hand, from the perspective of the entity training the model 10, it may counterintuitively be the case that the training efficiency benefits greatly from high resolution evaluation of these far-from-accurate predictions. That is, the difference between a prediction's being 10,000 percent and 20,000 percent away from the actual may not be meaningful to the owner of the data but may be extremely significant for improving the performance of the model 10. The nature of the constraints may thus inform how the prediction accuracy grading function is to be defined for a given application. This approach exploits the fact that, while constraints on the machine learning model may typically only set out an initial threshold limit, the way machine learning models get better is often by identifying improvements based on where the model has the largest deviations. Hence, this approach, while meeting the constraints, may place greatest importance on the largest prediction errors (i.e., the largest residuals) to dramatically improve the machine learning model's performance.
The prediction accuracy grading function may also be defined or modified to reflect a required number of buckets and/or a desired impression of the grading scheme in the eyes of the data owner in order to nominally meet a particular set of firewall constraints, such as the following:
For example, if directionality is considered separately from the number of buckets, then the same grading function represented by Table 1 may be recast so as to simulate a simple A to F grading scheme and meet the above requirements as shown in Table 2 below:
Based on the above grading function, a grading key (which may be embodied in the worksheet 140, for example), may provide the formula by which to score a prediction based on its corresponding actual, thereby determining bucket placement (e.g., (prediction-actual)/max (set of actuals)) and further may provide the distribution-optimized buckets as represented by Table 2, above. Sample results from a grading function encode (e.g., performed by the remote computer(s) 120) may be as shown in the following Table 3.
The various functionality and processes described herein in relation to the system 100 of
The above description is given by way of example, and not limitation. Given the above disclosure, one skilled in the art could devise variations that are within the scope and spirit of the invention disclosed herein. Further, the various features of the embodiments disclosed herein can be used alone, or in varying combinations with each other and are not intended to be limited to the specific combination described herein. Thus, the scope of the claims is not to be limited by the illustrated embodiments.
This application claims the benefit of U.S. Provisional Application No. 63/499,103, filed Apr. 28, 2023 and entitled “TRAINING REGRESSION MODELS USING TRUTH SET DATA PROXIES,” the entire contents of which is expressly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63499103 | Apr 2023 | US |