Detecting fraud continues to be an important function for business, government and other enterprises. As such enterprises rely more and more on transacting business electronically and keeping electronic records, there is an ongoing need to provide better tools adapted to interact with the varied software and data storage systems in use today.
Fraud detection includes real time detection, such as in connection with fraudulent on-line transactions, as well as investigation of potential fraud as evidenced in database records that exhibit specific characteristics. In many cases, at least a part of the investigation takes place after the fraud has occurred. An investigator, such as an employee of the enterprise or an outside investigator hired by the enterprise, reviews the enterprise's existing records to determine suspicious data, patterns associated with fraud and/or other indicators of fraudulent activity. If such investigation yields helpful results, such as through a process to confirm suspicious data attributes based on known cases of fraud in the existing records, then the same or similar methodology can be employed to investigate current records of ongoing activity.
Presently available tools for investigators fall short of providing effective assistance.
Described below are approaches to improving fraud detection strategies.
According to a method implementation, a computer-implemented fraud detection method for determining potentially fraudulent records in a database comprises executing a trial fraud detection strategy routine on historic records in the database, the trial detection strategy comprising multiple rules, calculating a number of the historic records determined to be proven fraud records according to the trial fraud detection strategy, calculating a number of the historic records determined to be false positive records according to the trial fraud detection strategy, calculating a trial efficiency of the trial fraud detection strategy based on a difference between a number of determined proven fraud records multiplied by a profit factor for each proven fraud record and a number of determined false positive records multiplied by a cost factor for each false positive record and determining an adjusted fraud detection strategy with an efficiency that equals or exceeds the trial efficiency.
In some implementations, at least one of the multiple rules has a respective weighting factor, and determining an adjusted fraud detection strategy comprises optimizing a weighting factor for the at least one of the multiple rules.
In some implementations, the multiple rules have respective weighting factors, and wherein determining an adjusted fraud detection strategy comprises using a genetic solution approach to determine adjusted weighting factors for the multiple rules such that the rules are weighted relative to each other differently in the adjusted fraud detection strategy than in the trial fraud detection strategy.
In some implementations, the genetic solution approach comprises selecting five child solutions derived by mutation from three parent solutions, and selecting next parents from the child solutions. In some implementations, the genetic solution approach is iterated at least three times in determining the adjusted fraud detection strategy.
In some implementations, the trial fraud detection strategy is executed once, and wherein determining an adjusted fraud detection strategy comprises selecting subsets of the results from the trial fraud detection strategy results and weighting each selected subset according to a predetermined method.
The foregoing and other objects, features, and advantages will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
The number of included records in the selected set, also called the threshold, can be set by a user of the system. For example, the user can set the threshold according to available resources, e.g., how much time of her time can be allotted to a review of the selected set of records. Other factors could also be taken into account.
Referring to
In the example of
Later in the program, each of these results is divided by 100 and then multiplied by its respective weighting factor, resulting in a “rule-score of the record,” sometimes also referred to as a “weighted rule.” For the specific example, the individual rule-scores are as follows: for Rule 1—100/100×50=50, for Rule 2—0/100×10=0, for Rule 3—100/100×20=20, for Rule 4—100/100×35=35 and for Rule 5—0/100×40=0.
A Total Score 312 is determined by summing up all of the individual rule-scores of those rules that hit this record. In the example of
It can be difficult, tedious and time consuming for the user to keep modifying which rules to include in the strategy and adjusting the respective weighting factors as she seeks to define a selected set of records for further action. In many cases, the user is interacting with the database as a business person and not as a database specialist who may have specific knowledge in how to search and/or filter records in alternative ways.
According to a new approach, the systems and methods described herein use one or more parameters that are optimized so as to yield the desired results without requiring repeated user interaction. For example, the user need not repeatedly adjust which rules to include in the detection strategy and/or their respective weighting factors. Rather, the systems and methods use factors to solve a complex problem iteratively and achieve an optimized solution automatically. The factors can be selected to define important criteria for the domain to which the records relate. The user can be prompted to supply the factors (generally, there are at least two), or they can be accessed in other ways.
In the example of fraud detection in motor vehicle insurance claims (accidents), the factors may include a profit factor and a cost factor. The profit factor can be defined as the profit that the insurer historically realized on average following the investigation and settlement of each record in which fraud was proven to have occurred. The cost factor can be defined as the cost that the insurer has incurred for investigating each record that was thought to relate to fraudulent activity but in fact turns out to be non-fraudulent. The profit and cost factors are well known to business functions and business people who are concerned with fraud detection. In other domains, different factors can be selected.
In step 104, the number of proven fraud results determined from executing the trial detection strategy is calculated. Similarly, in step 106, the number of false positive results determined from executing the trial detection strategy is calculated. In step 108, a trial efficiency is calculated. In this step, the influence of the factors is calculated. In the domain of motor vehicle insurance claims and knowing profit and cost factors and the numbers of proven fraud and false positive results, the trial efficiency is equal to the number of proven fraud results multiplied by the profit factor, minus the number of false positive results multiplied by the cost factor.
In step 110, it is determined whether a predetermined number of times or other constraint on the iterative process has been reached. If not, then in step 112 another iteration towards the solution is completed, and the process returns to step 110. Once the predetermined number of times (or other constraint) has been reached, then in step 114 an adjusted fraud detection strategy with changed parameters is output.
If not, then the process returns to step 152. If the predetermined number of times has been reached, then the strategy is set in step 162. If the predetermined number of times has not been met, then the process continues with another mutation (step 154) and recalculating the efficiency (step 156). If the output now satisfies the fitness function, then the strategy is set (step 160) and the process is concluded.
The first component 210 is a user interface component, which can implement a user interface for a desktop environment or for a mobile device environment (usable with smart phones, tablets and other types of mobile devices). The user interface can be implemented in HTML 5 or in any other suitable computing language/platform. The first component 210 reads detection strategies and detection method assignments from a strategy maintenance component 220.
A batch component 214 represents a background or “batch” job that is initiated upon pressing the “Start Optimization” button that orchestrates, with the optimization manager 222 described below, multiple iterations of the calculation of the fitness function (described below in greater detail) to provide optimal weighting factors for the strategy in question.
There is a calibration UI component 212 that can implement a calibration user interface. The calibration user interface presents the user with one or more parameters (also called “rules”) making up a detection strategy. The calibration user interface allows the user to modify at least one parameter, and thus to modify the corresponding detection strategy.
There is an optimization manager component 222 that controls optimization, including implementing three user interfaces that provide different functionality, depending on whether the optimization manager is called from the user side or internally from a background process. From a user's point view, the optimization run needs to be started and/or canceled, and the results have to be retrieved. This is done on the OData side by calling the interface IF_FRA_OPT_MANAGER_CONSUMER, described below in connection with the optimization method component 224.
The optimization method component 224 executes the optimization method, which can implemented as a genetic algorithm, as one example. The optimization method is called by the standardized optimization interface. The actual optimization run is triggered in the background job via a second interface IF_FRA_OPT_MANAGER method OPTIMIZE_WEIGHTING_FACTORS within the report FRA_DS_OPTIMIZATION.
The following steps are performed: check if optimistic lock for detection strategy is met; check if the user is allowed to calibrate; read the detection method assignments persisted on strategy level (not used from UI); set a first progress indicator for the run; calculate the raw detection results 226 by DB access instance of type IF_FRA_OPT_DB_ACCESS, thereby the parameter values from the UI are passed and taken into account for the raw results calculation; unlock the detection strategy 228; set the second progress indicator for the run; start the execution of the genetic algorithm, thereby the profit and cost factors from the UI as well as the threshold and the read detection method assignments are passed.
Optimization Trace 230 and Trace Mode refer to functionality implemented to allow the system to trace intermediate calculation results, i.e., to save them in a database table, by setting specific parameters (Set/Get-parameters) in order to fine tune the optimization algorithm. This tracing functionality is generally switched off in a “production” environment as it can reduce the performance of the optimization.
A Control/Results functionality 232 refers to a database table that is used to control flow of the program.
An Optimization Database Access component 244 manages a lifecycle of the connection to a database 218. “Proxy to raw optimization mass data kept in” the database 218 triggers parallel calculation of raw optimization data in the database into a raw results temporary table 248. In addition, the optimization database access component 244 triggers parallel calculation of a fitness function based on raw results in the database for an optimization method, such as a genetic algorithm. In some implementations, the parallel calculation of the fitness function occurs only once “for all beings of a generation.” “Beings” (i.e., parents, children) in this context are “sets of weighting factors.” Thus, {50, 10, 20, 35, 40} is a set of five weighting factors for the five rules of a given strategy. If the algorithm creates (randomly), say six sets of weighting factors (corresponding to five “children-beings”), it then is able to calculate in parallel the five “fitness-values” for these beings, using the implemented parameterizable fitness-function. The three “fittest” (according to the fitness-value ordering) are promoted to “parent-beings” (see
In some implementations, mass data is kept in the database and processed in parallel. Only highly aggregated results are passed from the database to the server. In this way, response time is improved and the optimization can be integrated into a dialog user interface. In this way, “heavy” work, i.e. data processing of large amounts of data, is done by the database, as opposed to first transferring those large amounts of data from the database to the application server in order to process it on the server in a non-optimized way. The server cannot complete the heavy work as quickly as the database because the server is designed in a database-agnostic way. Data aggregation leads to smaller data chunks that do not take much time to be transferred from the database to the server, so that it is possible to see the results of the processing in real time, i.e. during a dialog transaction.
An optimization generation component 242 generates a detection strategy suited for the database 218 to calculate the raw optimization data based on the defined rules. The optimization procedure 246 is a generated database procedure for each detection strategy. The output of the optimization procedure is a temporary table of raw results that is iteratively used by the optimization algorithm.
Once the execution of the genetic algorithm has finished, the optimized weighting factors are stored in the database table FRA_D_OPT_DMA (implemented in 232) and the status and progress of the run in table FRA_D_OPT_RESULT (also implemented in 232) are set to “finished” and “100%,” respectively. Additionally, the previously passed detection method parameters are deleted from table FRA_D_OPT_DMPAR.
The third interface implemented by the optimization manager component 222 is IF_FRA_OPT_MANAGER_INT which offers the method SET_OPTIMIZATION_RESULT. It is used by the genetic algorithm to update the progress of the algorithm in relation to the overall optimization progress within table FRA_D_OPT_RESULT.
Assuming that the average savings or profit per Proven Fraud record is 3200 (i.e., a profit factor kProfit of 3200 ), and that the average cost per False Positive record is 80 (i.e., a cost factor kCost of 80 ), then a baseline result for the potential value of the first simulation can be calculated as follows:
PF*kProfit−FP*kCost=540*3200 −19,109*80 =199,280
Based on the results shown in
PF*kProfit−FP*kCost=620*3200 −4,746*80 =1,604,320
In the example of
In the example, a non-linear optimization problem is presented, and a best solution to the problem is generated automatically using an iterative process. According to one implementation, the iterative process is based on a genetic algorithm, but other optimization processes could also be used.
According to the approach, an initial population is defined. In
The seed functionality of a random number generation routine is used to ensure stable results for multiple runs with the same parameters. A mutation interval can be varied. In some implementations, the mutation interval is statically reduced three times during the optimization run in order to reduce the variance of the mutation with the optimization progress. The simple mutation operator is usually sufficient, but other operators (e.g. recombination), can also be used to produce the children.
In the
The maximum number of iterations/generations depends on the number of detection methods assignments (n) of the detection strategy, as the complexity increases by factor √{square root over (n)} with the dimension of the optimization problem. The base parameter for the maximum number of generations can be set to 30 by default. The default configuration parameters of the genetic algorithm (μ, λ, max generations) have been validated by empiric test runs.
The quality and convergence of the genetic algorithm heavily depends on the selection of the fitness function and the mathematic model of the optimization problem. The following mathematic symbols and functions are used:
n: Number of Detection Method Assignments
w: Vector of n weighting factors
x: Detection Object to be analyzed
T: Threshold
S(w;x): Aggregated score of detection object x with weighting factors w
Di(x): Detection Result in [0,1] of detection method i
ks: Stretch factor of sigmoid function
kProfit: Profit factor of proven fraud cases
kCost: Cost factor of false positive cases
kwd: Weight Decay factor
The detection process of classifying a detection object x (e.g. an insurance claim, a purchase order, or other type of record) as fraudulent is based on the comparison of the aggregated score S of the assigned detection methods with the threshold T:
The aggregated score is calculated by multiplying the detection result Di(x) of each assigned detection method with its weighting factor wi of the detection strategy. To simplify the internal calculations during the optimization, a normalized threshold of T=1 is used for all optimization runs. The threshold and the resulting weighting factors can be easily scaled to the threshold of the detection strategy after the optimization is finished.
The standard detection processes uses a kind of step function σ(z) and returns fraud, if the aggregated score is greater than the threshold:
The optimization needs to be able to additionally evaluate the strength of the fraud indication. This has the following advantages:
Continuous indication for the optimization algorithm on progress even if threshold is not reached
Better generalization of optimization result for future detection objects to be analyzed
Therefore, the step function σ(z) is replaced with the sigmoid function sig(z;ks) for the optimization:
The factor ks can be used to stretch the standard sigmoid function. The factor ks=5 has shown good results in test runs and is used as default configuration.
Fraud detection is used across different industries and sectors with different requirements concerning its management. In insurance, it might be acceptable to miss some fraud cases in order to manage the workload on the investigators carefully. However, in compliance scenarios, companies usually cannot accept missing even a single fraud case. Therefore, an optimization purely on the efficiency is not sufficient, as it does not consider the missed fraud cases appropriately. Hence, the optimization target needs to be parameterized according to the business analysts' needs and industry. In the insurance context, the business analyst can use the profit factor kProfit and the cost factor kCost to parameterize the profit of a proven fraud case compared to the costs of a false positive case.
As shown in
Proven Fraud Cases evaluated with the sigmoid function are converging to the discrete PF KPI of 1 with high scores. Missed fraud cases are converging to the discrete PF KPI of 0 with low scores.
False Positive Cases evaluated with the sigmoid function are converging to the discrete FP KPI of 1 with high scores. True negative cases are converging to the discrete FP KPI of 0 with low scores.
The fitness function of the optimization uses a combination of the continuous PFfit and FPerror values:
The optimization tries to place the fraud cases with a high score above threshold by maximizing the PFfit value. Similarly, the optimization tries to minimize the FPerror by classifying the no fraud cases with a low score. Therefore, the overall fitness function is maximized by considering the factors kProfit and kCost given by the end user. Technically, the negative fitness function is minimized in order to solve a minimization problem with optimization algorithm.
If the optimization is running for many generations, the fitness function can be slightly improved by pushing the weighting factors to high positive or negative values. However, the real fraud detection result does not improve anymore. Therefore, the weight decay (wd) to penalize high weighting factors is added to the fitness function:
The weight decay uses the typical quadratic error of the weighting factors. The weight decay is normalized with the number of assigned detection methods n that are aggregated during the weight decay calculation. The weight decay is additionally normalized with weight decay factor kwd which is by default set to 1/100 of the calculated proven fraud fitness value.
The advantages of adding the weight decay include improving generalization for future detection objects, avoiding over-fitting on the historically classified alert items and/or improving convergence of optimization runs.
The implementation of the genetic algorithm is located in class CL_FRA_OPT_GENETIC in package FRA_CALIBRATION. The class implements the IF_FRA_OPT_METHOD interface which is used by the Optimization Manager 222 to call the algorithm at runtime. The implementation requires the method OPTIMIZE which returns a set of optimized weighting factors for the profit and cost factors given by the end user.
The interface IF— FRA— OPT_HDB_ACCESS is used to access the raw calibration results of the detection strategy in the database 218. The raw results are already calculated before the algorithm is executed. The method CALCULATE_SIGMOID calculates the continuous PFfit and FPerror values for a set of weighting factors w. Furthermore, the method GET_CLASSIFIED_ALERTS is called to get the number of classified fraud and no fraud cases within the raw calibration results. This information is used by the genetic algorithm in method CHECK_CLASSIFIED_ALERTS to check the minimum number of classified fraud cases for the optimization. The default setting requires at least 10*n classified fraud cases. This helps to reduce an over-fitting based on a very small number of classified data.
The starting generation of parents is created in method CREATE_START_PARENTS. These μ parent solutions are created via the usual mutation operation on the initial weighting factors. The initial weighting factors are initialized with value 1/n for each detection method in method CALC_START_WF.
After the optimization with the genetic algorithm is finished, an adaptation of the resulting threshold and weighting factors might be required. The calibration UI is only capable to show weighting factors between −100 and 100. If one of the optimized weighting factors is outside the valid interval, the threshold and weighting factors need to be scaled accordingly in method NORMALIZE_RESULT.
The weighting factor optimization is expected to work at the customer side without any additional configuration via the delivered default settings. However, there might be scenarios that require an adaptation of the default settings. It is very difficult to validate the optimization algorithm in different customer-like scenarios with real classified data during development. Hence, there is the possibility to override the default settings for a specific detection strategy.
It is also possible to implement a dynamic adaptation of the mutation interval via the success rate of previous mutation steps. Further, additional supportability functions (discrete KPIs (PF, FP) for each iteration, success rate) can be added. Implementations can be configured to avoid optimizing detection method assignments without any result at all. Classified data can be split into training and validation sets to avoid over-fitting and improve generalization on future detection objects. In some implementations, there are flexible criteria to stop an optimization routine before the maximum generations are reached.
As described, the system and methods allow the investigator to use historic data to develop a strategy that yields appropriate results for use on current data.
With reference to
A computing system may have additional features. For example, the computing system 1300 includes storage 1340, one or more input devices 1350, one or more output devices 1360, and one or more communication connections 1370. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system 1300. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing system 1300, and coordinates activities of the components of the computing system 1300.
The tangible storage 1340 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information in a non-transitory way and which can be accessed within the computing system 1300. The storage 1340 stores instructions for the software 380 implementing one or more innovations described herein.
The input device(s) 1350 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing system 1300. For video encoding, the input device(s) 1350 may be a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video samples into the computing system 1300. The output device(s) 1360 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system 1300.
The communication connection(s) 1370 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can use an electrical, optical, RF, or other carrier.
The innovations can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing system on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.
For the sake of presentation, the detailed description uses terms like “determine” and “use” to describe computer operations in a computing system. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
Any of the computer-readable media herein can be non-transitory (e.g., volatile memory such as DRAM or SRAM, nonvolatile memory such as magnetic storage, optical storage, or the like) and/or tangible. Any of the storing actions described herein can be implemented by storing in one or more computer-readable media (e.g., computer-readable storage media or other tangible media). Any of the things (e.g., data created and used during implementation) described as stored can be stored in one or more computer-readable media (e.g., computer-readable storage media or other tangible media). Computer-readable media can be limited to implementations not consisting of a signal.
Any of the methods described herein can be implemented by computer-executable instructions in (e.g., stored on, encoded on, or the like) one or more computer-readable media (e.g., computer-readable storage media or other tangible media) or one or more computer-readable storage devices (e.g., memory, magnetic storage, optical storage, or the like). Such instructions can cause a computing device to perform the method. The technologies described herein can be implemented in a variety of programming languages.
The technologies from any example can be combined with the technologies described in any one or more of the other examples. In view of the many possible embodiments to which the principles of the disclosed technology may be applied, it should be recognized that the illustrated embodiments are examples of the disclosed technology and should not be taken as a limitation on the scope of the disclosed technology. Rather, the scope of the disclosed technology includes what is covered by the following claims. We therefore claim as our invention all that comes within the scope and spirit of the claims.