Portions of the disclosure of this patent document contain materials that are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent document or patent disclosure as it appears in the U.S. Patent and Trademark Office patent files or records solely for use in connection with consideration of the prosecution of this patent application, but otherwise reserves all copyright rights whatsoever.
The present invention generally relates to new machine learning, quantitative anomaly detection methods and systems for uncovering fraud, particularly, but not limited to, insurance fraud, such as is increasingly prevalent in, for example, automobile insurance coverage of third party bodily injury claims (hereinafter, “auto BI” claims), unemployment insurance claims (hereinafter, “UI” claims), and the like.
Fraud has long been and continues to be ubiquitous in human society. Insurance fraud is one particularly problematic type of fraud that has plagued the insurance industry for centuries and is currently on the rise.
In the insurance context, because bodily injury claims generally implicate large dollar expenditures, such claims are at enhanced risk for fraud. Bodily injury fraud occurs when an individual makes an insurance injury claim and receives money to which he or she is not entitled—by faking or exaggerating injuries, staging an accident, manipulating the facts of the accident to incorrectly assign fault, or otherwise deceiving the insurance company. Soft tissue, neck, and back injuries are especially difficult to verify independently, and therefore faking these types of injuries is popular among those who seek to defraud insurers. It is estimated that 36% of all bodily injury claims, for example, involve some type of fraud.
In the unemployment insurance arena, about $54.8 billion UI benefits are paid annually in the U.S., of which about $6.0 billion are paid improperly. It is estimated that roughly $1.5 billion, or about 2.7% of benefits, of such improper payments are paid out on fraudulent claims. Additionally, roughly half of all UI fraud is not detected by the states, as determined by state level BAM (Benefit Accuracy Measurement) audits.
One type of insurance that is particularly susceptible to claims fraud is auto BI insurance, which covers bodily injury of the claimant when the insured is deemed to have been at-fault in causing an automobile accident. Auto BI fraud increases costs for insurance companies by increasing the costs of claims, which are then passed on to insured drivers. The costs for exaggerated injuries in automobile accidents alone have been estimated to inflate the cost of insurance coverage by 17-20% overall. For example, in 1995, premiums for the typical policy holder increased about $100 to $130 per year, totaling about $9-$13 billion.
One difficulty faced in the auto BI space is that the insurer does not often know much about the claimant. Typically, the insurer has a relationship with the insured, but not with the third party claimant. Claimant information is uncovered by the claims adjuster during the course of handling a claim. Typically, adjusters in claims departments communicate with the claimants, ensure that the appropriate coverage is in place, review police reports, medical notes, vehicle damage reports and other information in order to verify and pay the claims.
To combat fraud, many insurance companies employ Special Investigative Units (SIUs) to investigate suspicious claims to identify fraud so that payments on fraudulent claims can be reduced. If a claim appears to be suspicious, the claims adjuster can refer the claim to the SIU for additional investigation. A disadvantage of this approach is that significant time and skilled resources are required to investigate and adjudicate claim legitimacy.
Claims adjusters and SIU investigators are trained to identify specific indicators of suspicious activity. These “red flags” can tip the claims professional to fraudulent behavior when certain aspects of the claim are incongruous with other aspects. For example, red flags can include a claimant who retains an attorney for minor injuries, or injuries reported to the insurer well after the claim was reported, or, in the case of an auto BI claim, injuries that seem too severe based on the damage to the vehicle. Indeed, claims professionals are well aware that, as noted above, certain types of injuries (such as soft tissue injuries to the neck and back, which are more difficult to diagnose and verify, as compared to lacerations, broken bones, dismemberment or death) are more susceptible to exaggeration or falsification, and therefore more likely to be the bases for fraudulent claims.
There are many potential sources of fraud. Common types in the auto BI space, for example, are falsified injuries, staged accidents, and misrepresentations about the incident. Fraud is sometimes categorized as “hard fraud” and “soft fraud,” with the former including falsified injuries and incidents, and the latter covering exaggerations of severity involved with a legitimate event. In practice, however, there is a spectrum of fraud severity, covering all manner of events and misrepresentations.
Generally speaking, a fraudulent claim can be uncovered only if the claim is investigated. Many claims are processed and not investigated; and some of these claims may be fraudulent. Also, even if investigated, a fraudulent claim may not be recognized. Thus, most insurers do not know with certainty, and their databases do not accurately reflect, the status of all claims with respect to fraudulent activity. As result, some conventional analytical tools available to mine for fraud may not work effectively. Such cases, where some claims are not properly flagged as fraudulent, are said to present issues of “censored” or “unlabeled” target variables.
Predictive models are analytical tools that segment claims to identify claims with a higher propensity to be fraudulent. These models are based on historical databases of claims and patterns of fraud within those databases. There are two basic categories of predictive models for detecting fraud, each of which works in a different manner: supervised models and unsupervised models.
Supervised models are equations, algorithms, rules, or formulas that are trained to identify a target variable of interest from a series of predictive variables. Known cases are shown to the model, which learns the patterns in and amongst the predictive variables that are associated with the target variable. When a new case is presented, the model provides a prediction based on the past data by weighting the predictive variables. Examples include linear regression, generalized linear regression, neural networks, and decision trees.
A key assumption of these models is that the target variable is complete—that it represents all known cases. In the case of modeling fraud, this assumption is violated as previously described. There are always fraudulent claims that are not investigated or, even if investigated, not uncovered. In addition, supervised predictive models are often weighted based on the types of fraud that have been historically known. New fraud schemes are always presenting themselves. If a new fraud scheme has been devised, the supervised models may not flag the claim, as this type of fraud was not part of the historical record. For these reasons, supervised predictive models are often less effective at predicting fraud than other types of events or behavior.
Unlike supervised models, unsupervised predictive models are not trained on specific target variables. Rather, unsupervised models are often multivariate and constructed to represent a larger system simultaneously. These types of models can then be combined with business knowledge and claims handling and investigation expertise to identify fraudulent cases (both of the type previously known and previously unknown). Examples of unsupervised models include cluster analysis and association rules.
Accordingly, there is a need for an unsupervised predictive model that is capable of identifying fraudulent claims, so that such claims can be identified earlier in the claim lifecycle and routed more effectively for claims handling and investigation.
Generally speaking, it is an object of the present invention to provide processes and systems that leverage advanced unsupervised statistical analytics techniques to detect fraud, for example in insurance claims. While the inventive embodiments are variously described herein, in the context of auto BI insurance claims and, also, “UI” claims, it should be understood that the present invention is not limited to uncovering fraudulent auto BI claims or UI claims, let alone fraud in the broader category of insurance claims. The present invention can have application with respect to uncovering other types of fraud.
Two principal instantiations of the invention are described hereinafter: the first, utilizing cluster analysis to identify specific clusters of claims for additional investigation; the second, utilizing association rules as tripwires to identify out-of-the-ordinary claims or “outliers” to be assigned for additional investigation.
Regarding the first instantiation, the process of clustering can segment claims into groups of claims that are homogeneous on many dimensions simultaneously. Each cluster can have a different signature, or unique center, defined by predictive variables and described by reason codes, as discussed in greater detail hereinafter (additionally, reason codes are addressed in U.S. Pat. No. 8,200,511 titled “Method and System for Determining the Importance of Individual Variables in a Statistical Model” and its progeny—namely, U.S. patent application Ser. Nos. 13/463,492 and 61/792,629—which are owned by the Applicant of the present case, and which are hereby incorporated herein by reference in their entireties). The clusters can be defined to maximize the differences and identify pockets of like claims. New claims that are filed can be assigned to a cluster, and all claims within the cluster can be treated similarly based on business experience data, such as expected rates of fraud and injury types.
Regarding the second, association rules, instantiation, a pattern of normal claims behavior can be constructed based on common associations between claim attributes (for example, 95% of claims with a head injury also have a neck injury). Probabilistic association rules can be derived on raw claims data using, for example, the Apriori Algorithm (other methods of generating probabilistic association rules can also be utilized). Independent rules can be selected that describe strong associations between claim attributes, with probabilities greater than 95%, for example. A claim can be considered to have violated the rules if it does not satisfy the initial condition (the “Left Hand Side” or “LHS” of the rule), but satisfies the subsequent condition (the “Right Hand Side” or “RHS”), or if it satisfies the LHS but not the RHS. If the rules describe a material proportion of the probability space for the RHS conditions, then violating many of the rules that map to the RHS space are an indication of anomalous claims.
The choice of the number of rules that must be violated before sending a claim for further investigation is dependent on the particular data and situation being analyzed. Choosing fewer rules violations for which a claim is submitted to SIU can result in more false positives; choosing more rules violations can decrease false positives, but may allow truly fraudulent claims to escape detection.
Still other aspects and advantages of the present invention will in part be obvious and will in part be apparent from the specification.
The present invention accordingly comprises the several steps and the relation of one or more of such steps with respect to each of the others, and embodies features of construction, combinations of elements, and arrangement of parts adapted to effect such steps, all as exemplified in the detailed disclosure hereinafter set forth, and the scope of the invention will be indicated in the claims.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
For a fuller understanding of the invention, reference is made to the following description, taken in connection with the accompanying drawings, in which:
a and 12b graphically depict property damage claims made by a claimant over a period of time as well. as a natural binary split to illustrate an aspect of binning variables in the context of an auto BI claim being scored using association rules according to an embodiment of the present invention;
a-14d show sample results of applying the binning process illustrated in
a and 17b graphically depict the length of employment in days variable for the construction industry before and after a binning process in the context of a UI claim being scored using association rules according to an embodiment of the present invention;
a and 18b graphically depict the number of previous employers of an applicant over a period of time as well as a natural binary split to illustrate an aspect of binning variables in the context of a UI claim being scored using association rules according to an embodiment of the present invention; and
As noted above, two principal instantiations of the invention are described herein—the first, utilizes cluster analysis to identify specific clusters of claims for additional investigation. The second utilizes association rules to quantify “normal” behavior, and thus set up a series of “tripwires” which, when violated or triggered, indicate “non-normal” claims, which can be referred to a user for additional investigation. Generally, if properly implemented, fraud is found in the “non-normal” profile. These two instantiations are next described; first the clustering, followed by the association rules.
It is also noted that in the following description the term “claim” is repeatedly used as the object, construct or device in which the fraud is assumed to be perpetrated. This was found to be convenient to describe exemplary embodiments dealing with automotive bodily injury claims, as well as unemployment insurance claims. However, this use is merely exemplary, and the techniques, processes, systems and methods described herein are equally applicable to detecting fraud in any context, in claims, transactions, submissions, negotiations of instruments, etc., for example, whether it is in a submitted insurance claim, a medical reimbursement claim, a claim for workmen's compensation, a claim for unemployment insurance benefits, a transaction in the banking system, credit card charges, negotiable instruments, and the like. All of these constructs, devices, transactions, instruments, submissions and claims are understood to be within the scope of the present invention, and exemplified in what follows by the term “claim.”
In order to separate fraudulent from legitimate claims, claims can be grouped into homogenous clusters that are mutually exclusive (i.e., a claim can be assigned to one and only one cluster). Thus, the clusters are composed of homogeneous claims, with little variation between the claims within the cluster for the variables used in clustering. The clusters can be defined on a multivariate basis and chosen to maximize the similarity of the claims within each cluster on all the predictive variables simultaneously.
Turning now to the drawing figures (and starting with
The clusters can be defined based on the simultaneous, multivariate combination of predictive variables concerning the claim, such as, for example, the timeline during which major events in the claim unfolded (e.g., in the auto BI context, the lag between accident and reporting, the lag between reporting and involvement of an attorney, the lag to the notification of a lawsuit), the involvement of an attorney on the claim, the body part and nature of the claimant's injuries, and the damage to the different parts of the vehicle during the accident. For simplicity, it can be assumed that there are K clusters and that there are V specific predictive variables used in the clustering. The target variables (SIU investigation and fraud determination) may not be included in the clustering, first as these can be used to assess the predictive capabilities of the clusters, and second, because to do so could bias the data towards clustering on known fraud, not just inherent, and often counter-intuitive patterns that correlate with fraud.
In various exemplary embodiments of the present invention, the subset of predictive variables chosen for the clustering depends on the line of business and nature of the fraud that may occur. For auto BI, for example, the variables used can be the nature of the injury, the vehicle damage characteristics, and the timeline of attorney involvement. For fraud detection in other types of insurance, other flags may be relevant. For example, in the case of property insurance, relevant flags may be the timeline under which scheduled property was recorded, when calls to the police or fire department were made, etc.
Each of the V predictive variables to be included in the clustering can be standardized before application of the clustering algorithm. This standardization ensures that the scale of the underlying predictive variables does not affect the cluster definitions. Preferably, RIDIT scoring can be utilized for the purposes of standardization (
The clusters can be defined (step 50) using a variety of known algorithmic clustering methods, such as, for example, K-means clustering, hierarchical clustering, self-organizing maps, Kohonen Nets, or bagged clustering using a historical database of claims. Bagged clustering (step 51) is a preferred method as it offers stability of cluster selection and the capability to evaluate and choose the number of clusters.
Typically, selecting the number of clusters (step 52) is not a trivial task. In this case, bagged clustering can be used to determine the optimal number of clusters using the provided variables and claims. The bagged clustering provides a series of bootstrapped versions of the K-means clusters, each created on a subset of randomly sampled claims, sampled with replacement. The bagged clustering algorithm can combine these into a single cluster definition using a hierarchical clustering algorithm (step 53). Multiple numbers of clusters can be tested, k=V/10, . . . , V (where V is the number of variables). For each value of k, the proportion of variance in the underlying V variables explained by the clusters can be calculated. The k can be selected at the point of diminishing returns, where adding additional clusters does not greatly improve the amount of variance explained. Typically, this point is chosen based on the scree method (a/k/a, the “elbow” or “hockey stick” method), identifying the point where additional cluster improvement results in drastically less value.
Predictive variables can be averaged for the claims within each cluster to generate cluster centers (steps 54, 55 and 56). These centers are the high dimension representation of the center of each claim. For each claim, the distance to the center of the cluster can be calculated (step 55) as the Euclidean Distance from the claim to the cluster center. Each claim can be assigned to the cluster with the minimum Euclidean Distance between the cluster center K and the claim i:
where i=1, . . . N for each claim, v=1, . . . , V for each predictive variable, and k=1, K for each cluster.
Then, claim i can be assigned to cluster k where d(i,k)=argmink{d(i,k)} for a given claim.
For each cluster, a reason code for each variable can be calculated (step 57). Each variable in the cluster equation can contribute to the Euclidean Distance and can form the Reason Weight (RW) from the squared difference between the cluster center and the global mean for that variable. For each variable, the Reason Weight can be calculated using the cluster mean μk,v and the appropriate global mean and standard deviation for each variable, μk,v and σk,v respectively. The cluster mean for each variable is the mean of the variable for claims assigned to the cluster, and the global mean is the mean of the variable over all claims in the database. Then, the Reason Weight is:
The reason codes can then be sorted by the descending absolute value of the weight. The reason codes can enable the clusters to be profiled and examined to understand the types of claims that are present in each cluster.
Also, for each predictive variable, the average value within the cluster (i.e., μk,v) can be used to analyze and understand the cluster. These averages can be plotted for each cluster to produce a “heat map” (see, e.g.,
The reason codes and heat map help identify the types of claims that are present in each cluster, which allows a reviewer or investigator to act on each type of claim differently. For example, claims from certain clusters may be referred to the SIU based on the cluster profile alone, while claims from other clusters might be excluded for business reasons. As an example, the clustering methodology is likely to identify claims with very severe injuries and/or death. Claims from these clusters are less likely to involve fraud, and combatting this fraud may be difficult given the sensitive nature of the injury and presence of death. In this case, the insurer may choose not to refer any of these claims for additional investigation.
After the clusters have been defined using the clustering methodology, the clusters can be evaluated on the occurrence of investigation and fraud using the determinations on the historical claims used to define them (see, e.g.,
Appendix A sets forth an exemplary algorithm for creating clusters to evaluate new claims.
At step 100, the raw data describing the claims are loaded (via a data load process 20; see
For each claim attribute included in the scoring, standardized values for each variable are calculated based on the historical empirical quantiles for the claim (step 105). In some illustrative embodiments, this can be effected according to the method described in the cluster creation process described above with reference to
for all viεvεv calculate: Γi=[(vi+2qi)/Σi=1Nvi]−1; i=1, 2, . . . N,
where qi=max{Empirical Historical Quantile such that vi≦qi}
Each claim can then be compared against all potential clusters to determine the cluster to which the claim belongs by calculating the distance from the claim to each cluster center (steps 110 and 115). The cluster that has the minimum distance between the claim and the cluster center is chosen as the cluster to which the claim is assigned. The distance from the claim to the cluster center can be defined using the sum of the Euclidean Distance across all variables V, as follows:
At step 120, the claim is assigned to the cluster that corresponds to the minimum/shortest distance between the scored claim and the center (i.e., the cluster with the lowest score). Claims can then be routed through the SIU referral and claims handling process according to predefined rules.
If the claim is assigned to a cluster that is assigned for investigation (in whole or in part), then the claim can be forwarded to the SIU. Additionally, exceptions can be included, so that certain types of claims are never forwarded to the SIU. These types of rules are customizable. For example, as noted above, a given claims department may determine that claims involving a death are very unlikely to be fraudulent, and in these cases SIU investigations will not be undertaken. Then, even for claims assigned to clusters intended for investigation, if a claim involves a death, this claim may not be forwarded to the SIU. This would be considered a normal handling exception. Similarly, it may be determined that some types of claims should always be forwarded to the SIU. For example, it is possible that claims involving a particular claimant are highly suspicious based on previous interactions with that claimant. In this case, the claim would be referred to the SIU regardless of the clustering process. This would be an SIU handling exception. Thus, referring to
Each cluster can be analyzed based on the historical rate of referral to the SIU and the fraud rate for those clusters that were referred. Clusters where high percentages of claims were referred and high rates of fraud were discovered represent areas where the claims department should already know to refer these claims for additional investigation. However, if there are some claims in these clusters that were not referred historically, there is an opportunity to standardize the referral process by referring these claims to the SIU, which are likely to result in a determination of fraud.
Clusters with types of claims having high rates of referral to the SIU but low historical rates of fraud provide an opportunity to save money by not referring these claims for additional investigation as the likelihood for uncovering fraud is low.
Lastly, there are clusters that have low rates of referral, but high rates of fraud if the claims are referred. These clusters might contain previously unknown types of fraud that have been uncovered by the clustering process as a set of like claims with a high rates of fraud determination. However, it is also possible that these types of claims are not referred to the SIU because of a predefined reason, such as the claim involved a death. In some embodiments, these complex claims might be fully analyzed and referred only when there is the highest likelihood of fraud. In such cases, rules can be defined, stored and automatically executed as to how to handle each cluster based on the composition and profile of each cluster.
It should be understood that if the clusters are not effective at assisting in claims handling and SIU referral (step 59 in
The rules for referral to the SIU can be preselected based on the cluster in which the claim is assigned. For example, the determination can be made that claims from five of the clusters will be forwarded to the SIU, while claims from the remaining clusters will not.
Appendix B sets forth an exemplary algorithm for scoring claims using clusters.
The following examples more granularly describe clustering analysis in the context of both auto BI claims, and then UI claims.
Table 1 below identifies variables used in the auto BI clustering model example.
The original data extract contains raw or synthetic attributes about the claim or the claimant. To select a relevant subset of variables for fraud detection purposes, two steps can be applied:
1—Variable selection based on business rules data and common hypotheses to create a subset of the variables that are historically or hypothetically related to fraud.
2—Removal of highly correlated/similar variables:
In order to cluster the claims into like groups it is recommended to remove variables with high degrees of correlation to avoid double counting when measuring similarity between two claims. This is common in many of the text mining variables where a 0 or 1 flag is created to indicate if certain key words such as “head”, “neck”, “upper body injury”, etc. are detected in the claimant's accident report. Prior to clustering, the correlation of these attributes should be examined and if two text mining variables such as “txt_head” and “txt_neck” are highly correlated (e.g., 80% or higher) only one of them should be included in the model.
When selecting variables for fraud detection, the initial round of variable selection can be rules-based, drawing on common hypotheses in the context of the fraud domain.
The starting point for variable selection is the raw data that already exists and that is collected by the insurer on the policy holders and the claimants. Additional variables may be created by combining the raw variables to create a synthetic variable that is more aligned with the business context and the fraud hypothesis. For example, the raw data on the claim can include the accident date and the date on which an attorney became involved on the case. A simple synthetic variable can be the lag time in days between the accident date and the attorney hire date.
In exemplary embodiments of the present invention, various synthetic variables can be automatically generated, with various pre-programmed parameters. For example, various combinations, both linear and nonlinear, of each internal variable with each external variable can be automatically generated, and the results tested in various clustering runs to output to a user a list of useful and predictive synthetic variables. Or, the synthetic generation process can be more structured and guided. For example, distance between various key players in nearly all fraudulent claims or transactions is often indicative. Where a claimant and the insured live very close to each other, or where a delivery address for online ordered merchandise is very far from the credit card holder's residence, or where a treating chiropractor's office is located very far from the claimant's residence or work address, often fraud is involved. Thus, automatically calculating various synthetic variable combinations of distance between various locations associated with key parties to a claim, and testing those for predictive value, can be a more fruitful approach per unit of computing time than a global “hammer and tongs” approach over an entire variable set.
In the exemplary process for variable selection in auto BI claims fraud detection described hereinafter, variables can be classified into, for example, 9 different categories. Examples from each category are set forth below:
In fraud detection, knowing the chronology and the timing of events can inform a hypothesis around different types of BI claims. For example, when a person is injured, the resulting claim is typically reported quickly. If there is a long lag until the claim is reported, this can suggest an attempt by the claimant to allow the injury to heal so that its actual severity is harder to verify by doctors and can be exaggerated.
Also, an attorney typically gets involved with a claim after a reasonable period of about 2-3 weeks. If the attorney is present on the first day, or if the attorney becomes involved months or years later, this can be considered suspicious. In the first instance, the claimant may be trying to pressure a quick settlement before an investigation can be performed; and in the second instance, the claimant may be trying to collect some financial benefit before a relevant statute of limitations expires, or the claimant may be trying to take advantage of the passage of time when evidence has become stale to concoct a revisionist history of the accident to the claimant's advantage.
Additionally, if the claim happens very quickly after the policy starts, this suggests suspicious behavior on the part of the insured. The expectation is that accidents will occur in a uniform distribution over the course of the policy term. Accidents occurring in the first 30 days after the policy starts are more likely to involve fraud. A typical scenario is one where the insured signs up for coverage and immediately stages an accident to gain a financial benefit quickly before premiums become due.
Variables derived based on the timeline of events can include the Policy Effective Date, the Accident Date, the Claim Report Date, the Attorney Involvement Date, the Litigation Date, and the Settlement Date.
A lag variable refers to the time period (usually, days) between milestone events. The date lags for the BI application are typically measured from the Claim Report Date of the BI portion of the claim (i.e., when the insurer finds out about the BI line).
Table 2 below sets forth examples of variables based on lag measures:
Attorney involvement and the timing around litigation can inform whether to refer a claim to the SIU. Based on this insight, relevant variables such as those set forth in Table 3 below can be included in the analysis dataset.
Looking at the type of injury in conjunction with other information about an accident (such as speed, time of day and auto damage) helps in assessing the validity of the claim. Therefore, variables that indicate if certain body parts have been injured are worthy of inclusion. A majority of the variables in this category are indicators (0 or 1) for each body part. Table 4 below sets forth examples of injury information variables. The “TXT_” prefix indicates extraction using word matching from a description provided by the claimant (or a police report or EMT or physician report).
As noted earlier, certain types of injuries are harder to verify, such as, for example, soft tissue injuries to the back and neck (lacerations, broken bones, dismemberment and death are verifiable and therefore harder to fake). Fraud tends to appear in cases where injuries are harder to verify, or the severity of the injury is harder to estimate.
Information on vehicle damage in conjunction with body injury and other claim information (such as road condition, time of day, etc.) helps in assessing the validity of the claim. Similar to body part injuries, vehicle damage information, for example, can be included as a set of indicators that are extracted from the description provided by the claimant or the police report. Table 5 below sets forth examples of vehicle damage variables. There are two prefixes used for vehicle damage indicators: 1) “CLMNT_” refers to the vehicle damage on the claimant vehicle, and 2) “PRIM_” refers to the vehicle damage on the primary insured driver.
Although vehicle damage is easy to verify, not all types of vehicle damage signals are equally likely, and some are suspicious. For example, in a two-car rear-end accident, front bumper damage is expected on one vehicle and rear bumper damage on the other, but not roof damage. Additionally, combinations of vehicle damage should be associated with certain combinations of injuries. Neck/back soft tissue injuries, for example, can be caused by whiplash, and should therefore involve damage along the front-rear axis of the vehicle. Roof, mirror, or side-swipe damage may be indicative of suspicious combinations, where the injury observed would not be expected based on the damage to the vehicle.
Variables in both the “Injury Information” and “Vehicle Damage” categories are typically extracted from the claims adjuster's free form notes or transcribed conversations with the claimant and insured. Variables in each of these two categories are only indicators with values of 0 and 1. Depending on the technique used for text mining, a value of 1 can mean, for example, the specific word or phrase following “TXT_” exists in the recorded notes and conversations.
The raw text can be used to derive a “suspicion score” for the adjuster. Additionally, unexpected combinations of notes and information may be picked up at a more detailed level than using strict text indicators.
The techniques used for extracting the information can range from simple searches for a word or an expression to more sophisticated techniques that build probabilistic models that take into account word distributions. Using more sophisticated algorithms (e.g., natural language processing, computational linguistics, and text analytics) allows more complex variables to be identified that reflect subjective information such as, for example, the speaker's affective state, attitude or tone (e.g., sentiment analysis).
In the instant example, simple keyword searches for expressions such as “BUMPER” or “SPINAL_INJURY” can be performed with numerous computer packages (e.g., Perl, Python, Excel). For example, the value of 1 for variable “CLMNT_BUMPER” can mean that the car bumper has been damaged in the accident. For other variables, key word searching can be augmented by adding rules regarding preceding or following words or phrases to give more confidence to the variable meaning. For example, a search for “JOINT_SURGERY” may be augmented by rules that require words such as “HOSPITAL”, “ER”, “OPERATION ROOM”, etc., to be in the preceding and following phrases.
Basic information concerning the primary insured driver and the claimant are key to creating meaningful clusters of the claims. Historical information (e.g., past claims, or past SIU referrals) along with other information (e.g., addresses) should be selected for the clustering to better interpret the cluster results. Table 6 below sets forth examples of the information about the claimant and the primary insured that can be included for each claim.
While an insurer generally knows the insured party well (in a data and historical sense), the insurer may not have encountered the claimant before. The CLMSPERCMT variable keeps track of cases where the insurer has encountered the claimant on a different claim. Multiple encounters should raise a red flag. Additionally, if the claimant's and insured's addresses are within 2 miles of each other, this could indicate collusion between the parties in filing a claim, and may be a sign of fraud.
Information about the claim, focused on the accident, is essential to understanding the circumstances surrounding the accident. Facts such as the road conditions, time of day, day of the week (weekend or not) and other information about the location, witnesses, etc. (as much as is available) if not consistent with other information may raise red flags as to the validity of the claimant's information or type of body injury claimed. Some exemplary variables are set forth in Table 7 below.
Another piece of information that can be used in the clustering model is the predicted severity of the claim on the day it is reported (see Table 8 below). This can be the output of a predictive model that uses a set of underlying variables to predict the severity of the claim on the day it is filed.
Generally speaking, a centile score can be a number from 1-100 that indicates the risk that the claim will have higher than average severity for a given type of injury. For example, a score of 50 would represent the “average” severity for that type of injury, while a higher score would represent a higher than average severity. Additionally, these scores may be calculated at different points during the life of the claim. The claim may be scored at the first notice of loss (FNOL), at a later date, such as 45 days after the claim was reported, or even later. These scores may be the product of a predictive modeling process. The goal of this type of score is to understand whether the claim will turn out to be more or less severe than those with the same type of injury. Assessing claims taking into account injury type and severity using predictive modeling is addressed in U.S. patent application Ser. No. 12/590,804 titled “Injury Group Based Claims Management System and Method,” which is owned by the Applicant of the present case, and which is hereby incorporated by reference herein in its entirety.
This information sheds light on the people involved in the accident (including demographic information, in particular, financial status). Given that the goal of insurance fraud is to wrongfully obtain financial benefits, this information is quite pertinent as to tendency to engage in fraudulent behavior.
On average, fraud tends to come from areas where there is more crime and often is more prevalent in no-fault states.
Although not included in the present example, fraud detection can be achieved through construction of social networks based on associations in past claims. If the individuals associated with each claim are collected and a network is constructed over time, fraud tends to cluster among certain rings, communities, and geometric distributions.
A network database can be constructed as follows:
1) Maintain a database of unique individuals encountered on claims. These represent “nodes” in the social network. Additionally, track the role in which the individual has been involved (claimant, insured, physician or other health provider, lawyer, etc.)
2) For each encounter with an individual, draw a connection to all other individuals associated with that claim. These connections are called “edges,” and form the links in the social network.
3) For each claim where a claim was investigated by SIU, increment the count of “investigations” associated with each node. Similarly, track and increment the number of “fraud” for each node. The ratio of known fraud to investigations is the “fraud rate” for each node.
Fraud has been demonstrated to circulate within geometric features in the network (small communities or cliques, for example). This analysis allows the insurer to track which small groups of lawyers and physicians tend to be involved in more fraud, or which claimants have appeared multiple times associated with different lawyers and physicians or pharmacists. As cases that were never investigated cannot have known fraud, this type of analysis helps find those rings of individuals where past behavior and association with known fraud sheds suspicion on future dealings.
Fraud for a given node can be predicted based on the fraud in the surrounding nodes (sometimes called the “ego network”). In other words, fraud tends to cluster together in certain nodes and cliques, and is not randomly distributed across the network. Communities identified through known community detection algorithms, fraud within the ego network of a node, or the shortest distance (within the social network) to a known fraud case are all potential predictive variables.
Prior to running the clustering algorithm, each null value should be removed—either by removing the observation or imputing the missing value based on the other applications.
1) Imputing Missing Values:
If the variable value is not present for a given claim, the value can be imputed based on preselected instructions provided. This can be replicated for each variable to ensure values are provided for each variable for a given claim. For example, if a claim does not have a value for the, variable ACCOPENLAG (lag in days between the accident date and the BI line open date), and the instructions require using a value of 5 days, then the value of this variable for the claim would be 5.
2) Scaling:
For each observation in the present example, there are 78 attributes, which have different value ranges. Some variables are binary (i.e., 0 or 1); some variables capture number of days (1, 2, . . . 365, . . . ) and some values refer to dollar amounts. Since calculating the distance between the observations is at the core of the clustering algorithm, these values all need to be in the same scale. If the values are not transformed to a single scale, those with larger values, such as household income (in 000s of dollars), affect the distance between two observations whose other attribute values are age (0-100) or even binary (0-1).
Accordingly, in exemplary embodiments of the present invention, three common transformation techniques, for example, can be used to scale the data:
a. Linear Transformation:
Linear transformation is the computationally easiest and most intuitive. The attribute values are transformed to a 0-1 scale. The highest value for each attribute gets a value of 1 and the other values are assigned a value linearly proportional to the max value:
Linearly Transformed Attribute=Attribute Value for the claim/Max(Attribute Value across all claims)
Despite its simplicity, this method does not take into account the frequency of the observation values.
b. Normal Distribution Scaling (Z-Transformation):
The Z-Transform centers the values for each attribute around the mean value where the mean value is assigned to zero and any application with the Attribute Value greater (lower) than mean is assigned a positive (negative) mapped value. To bring value to the same scale, the difference of each value to the mean is divided by the standard deviation of the values for that attribute. This method works best for attributes where the underlying distribution is normal (or close to normal). In fraud detection applications, this assumption may not be valid for many of the attributes, e.g., where the attributes have binary values.
c. RIDIT (Using Values from Initial Data)
RIDIT is a transformation utilizing the empirical cumulative distribution function derived from the raw data. It transforms observed values onto the space (−1, 1). The RIDIT transformation can be used to scale the values to the (−1, +1) scale. Appendix B illustrates the formulation for the RIDIT transformation and Table 10 below illustrates exemplary inputs and outputs.
As shown, the mapped values are distributed along the (−1,+1) range based on the frequency that the raw values appear in the input dataset. The higher the frequency of a raw value, the larger its difference from the previous value in the (−1,+1) scale.
Clustering performed in multiple iterations on the same data using each of the three scaling techniques reveals RIDIT to be the preferred scaling technique here as it enables a reasonable differentiation between observations when clustering while it does not over account for rare observations.
In contrast, Z-Transformation is very sensitive to the dispersion in data and when the clustering algorithm is run on the data transformed based on normal distribution, it results in one very big cluster containing the majority (>60% up to 97%) of the observations and many smaller clusters with low number of observations. Such results can provide insufficient insight as they fail to adequately differentiate the claims based on a given set of underlying attributes.
Both RIDIT and linear transformation result in well distributed and more balanced clusters in terms of the number of observations. However, linear transformation despite the ease and simplicity in calculation can be misleading when working with data that is not uniformly distributed since it fails to adequately account for the frequency of values for a given attribute across observations. Distance measures can be overemphasized when using linear transformation in cases where a rare observation has a raw value higher than the observation mean, which may force a clusters to be skewed.
The appropriate number of clusters is dependent on the number of variables, distribution of the attribute values and the application. Methods based on principal component analysis (PCA), such as scree plots, for example, can be used to pick the appropriate number of clusters. An appropriate number for clusters means the generated clusters are sufficiently differentiated from one another, and relatively homogeneous internally, given the underlying data. If too few clusters are selected, the population is not segmented effectively and each cluster might be heterogeneous. On the other hand, the clusters should not be too small and homogenized that there is no significant differentiation between a cluster and the one next to it. Thus, if too many clusters are picked, some clusters might be very similar to other clusters, and the dataset may be segmented too much. An exemplary consideration for choosing the number of clusters is identifying the point of diminishing returns. It should be appreciated, however, that further segmentation beyond the “point of diminishing returns” may be required to get homogeneous clusters. Homogeneity can also be defined using other statistical measures, such as, for example, the pooled multidimensional variance or the variance and distribution of the distance (Euclidean, Mahalanobis, or otherwise) of claims to the center of each cluster.
In an auto BI fraud detection application, the greater the number of clusters, the higher the percentage of (known) fraud that can be found in a given cluster. Even though the (known) fraud flag or SIU referral is not included in the clustering dataset (as noted above), with more clusters there will be clusters within which the rate of SUI referral or fraud is much higher than (e.g., more than 2×) the average rate.
Scree plots tend to yield a minimum number of clusters. While there are benefits in having more clusters, to find a cluster(s) with high (known) fraud rate, it is desirable, for example, to select a number between the minimum and a maximum of about 50 clusters. For example, for a dataset with 100 variables that are a mix of continuous, binary and categorical variables, where scree plots recommend 20 clusters, selecting about 40 can provide an appropriate balance between having unique cluster definitions and having clusters that have unusually high percentages of (known) fraud, which can be further investigated using techniques such as a decision tree.
In sum, the choice of the number of clusters should be a cost weighted trade-off between the size and homogeneity of the clusters. As a rule of thumb, at least 75% of the clusters should each have more than 1% of the data.
After running the clustering algorithm on the data and creating the clusters, each cluster can be described based on the average values of its observations. Claims, in this running example, are clustered on 128 dimensions covering the injury, vehicle parts damaged, and select claim, claimant and attorney characteristics. The claims into 40 homogeneous clusters with each cluster highly similar on the 128 variables. Using a visualization technique such as, for example, a heat map is a preferred way to describe and define reason codes for each cluster. Each cluster has a “signature.” For example:
Based on hypotheses about potential ways of committing BI fraud, clusters with descriptions similar to these hypotheses are selected. As the heat map 300 depicted in
On the other hand, all of the claims in cluster 15 involved lower joint or lower back injuries with very low death rate and laceration. Given that nearly 40% of claims resulted in a lawsuit and 82% of them involved an attorney, it is plausible to consider the likelihood of soft fraud in such claims (e.g., when the claimant includes hard-to-diagnose low cost joint or back pain that may not have been caused by the accident that is the subject of the claim).
The process of cluster evaluation can be automated and streamlined using a data-driven process. Referring to
Another method for profiling claims can be by using reason codes. As noted above, reason codes describe which variables are important in differentiating one cluster from another. For example, each variable used in the clustering can be a reason. Reasons can be ordered, for example, from the “most impactful” to the “least impactful” based on the distribution of claims in the cluster as compared to all claims.
If a known fraud indicator is available, then the following method may be used to determine the profile or reason a claim is selected into a particular cluster:
1. For each cluster k, calculate the fraud rate fk, k=1, . . . , K
2. For all clusters calculate f*global fraud rate for all claims
3. Set
4. For each cluster k, calculate the mean uvk, k=1, . . . , K and v=1, . . . , V
5. For each variable v calculate μv and σ*v the global mean and standard deviation for all claims
6. Calculate
7. For each cluster k generate R+k(j) or R−k(j) for 0<j≦V which may act as the top j reasons claim i is more (or less) likely to be fraudulent where R+k(j) or R−k(j) are ordered by |Wvk|
In the absence of a known fraud rate, the following method can be used to determine the cluster profile.
1. For each cluster k, calculate the mean fraud rate uvk, k=1, . . . , K and v=1, . . . , V
2. For each variable v calculate μ*v and σ*v the global mean and standard deviation for all claims
3. Calculate
4. Set
5. For each cluster k, generate R+j(j) and R−k(j) for 0<j≦V which may act as the top j positive and top j negative reasons for selecting claim i into cluster k where R+k(j) are the top j variables ordered by Wvk and R−k(j) are the bottom j variables ordered by Wvk
Referring to Table 11, cluster 1, for example, is best identified as containing claims involving joint surgery, spinal surgery, or any kind of surgery; while cluster 2 is best identified as containing lacerations with surgery, or lacerations to the upper or lower extremities. Cluster 3 is best identified by containing claims where the claimant lives in areas with low percentages of seniors, short periods of time from the report date to the statute of limitations, and few neck or trunk injuries.
A decision tree is a tool for classifying and partitioning data into more homogeneous groups. It can provide a process by which, in each step, a data set (e.g., a cluster) is split over one of the attributes—resulting in two smaller datasets—with one containing smaller and the other one bigger values for the attribute on which the split occurred. The decision tree is a supervised technique, and a target variable is selected, which is one of the attributes of the dataset. The resulting two sub-groups after the split thus have different mean target variable values. A decision tree can help find patterns in how target variables are distributed, and which key data attributes correlate with high or low target variable values.
In fraud detection applications, a binary target such as SIU Referral Flag, which has values of 0 (not referred) and 1 (referred), can be selected to further explore a cluster. As previously explained, clusters with reason codes aligned with fraud hypotheses or those with higher rates of SIU referral compared to average rates are considered for further investigation.
In exemplary embodiments of the present invention, one of the ways to further investigate a cluster, once formed, as described above, is to apply a decision tree algorithm to that cluster. For example, in a BI fraud detection application, a cluster with a much higher rate of SIU referral than average of all claims in the analysis universe can be further partitioned to explore what attributes contribute to the SIU referral.
Implementing a decision tree using packaged software, or custom developed computer code, the optimal split can, for example, be selected by maximizing the Sum of Squares (SS) and/or LogWorth values. Therefore, such software generally suggests a list of “Split Candidates” ranked by their SS and LogWorth scores.
In the exemplary decision tree illustrated in
On the next split of the claims with the severity score lower than 23, an optimal split candidate is the “rear end damage” to the car. This variable also makes sense for the business mindset and is aligned with soft fraud hypothesis.
The third split on the far right branch, however, is a case where the variable that was mathematically optimal, i.e., the lag days between REPORT DATE and Litigation, was not selected for split. To perform a close-to-optimal split that makes sense, the best variable to replace was whether or not a lawsuit was filed. Based on this split, out of the 29 claims, 5 did not have a suit and were not referred to SIU; but from the 24 that had a suit, only 20 were referred to SUI.
By way of an additional example, the following describes a process for creating an ensemble of unsupervised techniques for fraud detection in UI claims. This involves combining multiple unsupervised and supervised detection methods for use in scoring claims for the purpose of mitigating unemployment insurance fraud.
Fraud in the UI industry is a significant cost, ultimately born as a tax by businesses that pay into the system. Employers in each state pay a tax (premium) into a fund that pays benefits (claims) to workers who were laid off. Although the laws differ by state, generally speaking, workers are eligible to file a claim for UI benefits if they were laid off, are able to work and are looking for work.
Benefit payments in the UI system are based on earnings for the applicant during the base period. The benefit is then paid out on a weekly basis. Each week, the applicant must certify that he/she has not worked and earned any wages, (or if they have, to indicate how much was earned). Any earnings are then removed from the benefit before it is paid out. Typically, the claimant is approved for a weekly benefit that has a maximum cap (usually ending after 26 weeks of payment, although recent extensions to the federal statutes have made this up to 99 weeks in some cases).
Individuals who knowingly conceal specifics of their eligibility for UI may be committing fraud. Fraud can be due to a number of reasons, such as, for example, understating earnings. In the U.S. today, roughly 50% of UI fraud is due to benefit year overpayment fraud—the type of fraud committed when the claimant understates earnings and receives a benefit to which he or she is not entitled. Although the majority of overpayment cases are due to unintentional clerical errors, a sizable portion are determined to be the result of fraud, where the applicant willfully deceives the state in order to receive the financial benefit.
In the typical UI fraud detection analytical effort, certain pieces of information are available to detect fraud. Broadly speaking, the information covers the eligibility, initial claim, payments or continuing claims, and the resulting adjudication information, i.e., overpayment and fraud determinations. Information derived from initial claims, continuing claims/payments, or eligibility can be used to construct potential predictors of fraud. Adjudication information is the result, indicating which claims turned out to involve fraud or overpayments.
Representative pieces of information available from these data sources are set forth in Table 12 below:
Many states utilize federal databases to identify improper UI payments based on when workers have to report earnings to the IRS. However, this process does not apply to self-employed individuals, and is easy to manipulate for predominantly cash businesses and occupations. When the wage is hard to verify, the applicant has an increased opportunity to commit fraud. Other types of fraud are similarly difficult to detect as they are hard to verify, such as eligibility requirements (e.g., the applicant is not eligible due to the reason for separation from a previous employer, or is not able and available to work if a job came up, or is not searching for work, etc.). As with fraud in other industries and insurance applications, fraud in UI tends to be larger where the claim or certain aspects of the claim are harder to verify.
To select the appropriate types of predictive variables in the UI space, variables on self-reported elements of the claim that are difficult to verify, or take a long time to verify, are collected. In UI, these are self-reported earnings, the time and date the applicant reported the earnings, the occupation, years of experience, education, industry, and other information the applicant provides at the time of the initial application, and the method by which the individual files the claim (phone versus Internet). Behavioral economic theories suggest that applicants may be more likely to deceive when reporting information through an automated system such as an automated phone screen or a website.
In this example, the specific methods for detecting anomalies fraud in the UI space can include clustering methods as well as association rules, likelihood analysis, industry and occupational seasonal outliers, occupational transition outliers, social network, and behavioral outliers related to how the individual applicant files continuing claims over the benefit lifetime. Additionally, an ensemble process can be employed by which these methods can be variously combined to create a single Fraud Score.
As described above in connection with the auto BI example, claims can be clustered using unsupervised clustering methods to identify natural homogeneous pockets with higher than average fraud propensity. In this case, due to the business case for UI, the following five different clustering experiments are designed to address some of the fraud hypotheses grounded in observing anomalous behavior—for example, getting a high weekly benefit amount for a given education level, occupation and industry:
1) Clustering Based on Account History and the Applicant's History in the System:
This experiment includes 11 variables on account and the applicant's past activity such as: Number of Past Accounts, Total Amount Paid Previously, Application Lag, Shared Work Hours, Weekly Hours Worked.
2) Clustering Based on Applicant Demographics and Payment Information:
This experiment includes 17 variables on applicant's demographics such as age, union membership, U.S. citizenship, as well as information about the payment such as number of weeks paid, tax withholding, etc.
Unlike applicant demographic data, which is known at the time of initial filing, the payment related data (e.g., number of weeks paid) are not known on the initial day of filing. Therefore, considerations should be made when applying this model to catch fraud at the time of filing.
3) Clustering Based on the Applicant's Occupation and Demographics and Payment Information:
This experiment is similar to number 2 above with the difference that applicant's occupation indicators are added to tease out and further differentiate the clusters and discover anomalous applications.
4) Clustering Based on Employment History, Occupation and Payment Information:
This aims to cluster based on the applicant's occupation, industry in which the applicant worked and the amount of benefits the applicant received.
5) Clustering Based on the Combination of the Variables:
This captures all of the variables to create the most diverse set of variables about an application. While the cluster descriptions have a higher degree of complexity in terms of the combination of the variable levels and are harder to explain, they are more specific and detailed.
As discussed above in connection with the auto BI example, the method of standardization for the values of individual values has a large impact on the results of a clustering method. In this example, RIDIT is used on each variable separately. In this case, as in the auto BI case, the RIDIT transformation is preferred over the Linear Transformation and Z-Score Transformation methods in terms of post-transform distributions of each variable as well as the results of the clustering.
As described above in connection with the auto BI example, picking the appropriate number of clusters is key to the success and effectiveness of clustering for fraud detection. The number of clusters selected depends on the number of variables, underlying correlations and distributions. After RIDIT transformation, multiple numbers of clusters are considered.
The data for each experiment are individually examined and a recommended minimum number of clusters is determined based on the scree plots. The minimum number of clusters chosen is based on the internal cluster homogeneity, total variation explained, diminishing returns from adding additional clusters, and size of clusters. In each case, homogeneity is measured within each cluster using the variance of each variable, the total variance explained by the clusters, the amount of improvement in variance explained by adding a marginal cluster, and the number of claims per cluster.
However, to attain the highest fraud rate within a cluster in each experiment, all the experiments are conducted with a maximum of 50 clusters to create highest differentiation among the clusters. Table 13 below shows the highest fraud rate found in clusters for each of the experiments:
As described above in connection with the auto BI example, each cluster is profiled by calculating the average of the relevant predictive variables within each cluster. The clusters can then be evaluated based on a heat map to enable patterns, similarities and differences between the different clusters to be readily identifiable. As illustrated in the heat map 400 depicted in
In addition to analyzing which clusters tend to contain more fraudulent claims, individual claims may be evaluated based on the distance an individual claim is from the cluster to which it belongs. It should be noted that in this clustering example, it is assumed that the clustering method is a “hard” clustering method, or that a claim is assigned to one and only one cluster. Examples of hard clustering methods include k-means, bagged clustering, and hierarchical clustering. “Soft” clustering methods, such as probabilistic k-means or Latent Dirichlet Analysis, or other methods provide probabilities that the claim is assigned to each cluster. Use of such soft methods is also contemplated by the present invention—just not for the present example.
For hard clustering methods, each claim is assigned to a single cluster. The other claims in the cluster are the peer group of claims, and the cluster should be homogeneous in the type of claims within the cluster. However, it is possible that a claim has been assigned to this cluster but is not like the other claims. That could happen because the claim is an outlier. Thus, the distance to the center of the cluster should be calculated. Here, the Mahalanobis Distance is preferred (e.g., over the Euclidean Distance) in terms of identifying outliers and anomalies, as it factors in the correlation between the variables in the dataset. Whether a given application is far from the center of its cluster depends on the distribution of other data points around the center. A data point may have a shorter Euclidean distance to center, but if the data are highly concentrated in that direction, it may still be considered as an outlier (in this case the Mahalonobis distance will be a larger value).
The Euclidean Distance Di,d=√{square root over (Σj=1J(xj−
in other words, the average of the variable j across all claims i=1, . . . , Nd within cluster d, where Nd is the number of claims in cluster d. Thus, what is calculated is the square root of the sum of squares across the variable to the average of each cluster. The Mahalanobis Distance is a similar measure, except that the distances involve the covariances as well. Written in matrix notation, this is Mi,d2=(X−μ)TΣ−1(X−μ). As above, each claim has a given Mahalanobis Distance to each cluster center. As the claim is assigned to only 1 cluster, then Mi2=Mi,d2. For clustering methods where the claim is not assigned to a single cluster, than the distance M2 is the average of the distance to all cluster centers, weighted by the probability that the claim belongs to each potential cluster.
For each cluster, a histogram of the Mahalanobis Distance (M2) can be produced to facilitate the choice of cut-off points in M2 to identify individual applications as outliers.
Claims can be identified as outliers based on multiple potential tests. The process can be as follows:
For each cluster:
Another type of unsupervised analytical method, the network analysis, can achieve fraud detection through the construction of social networks based on associations in past claims. If the individuals associated with each claim are collected and a network is constructed over time, fraud tends to cluster among certain subsets of individuals, sometimes called communities, rings, or cliques. Here, the network database can be constructed as follows:
1. Maintain a database of unique employers and employees encountered on UI claims. These represent “nodes” in the social network. Additionally, track the wages that an employee earns with the employer. If the amount is immaterial (e.g., less than 5% of the employee's earnings) than do not count the association.
2. For each employer, draw a connection to all other employers where an employee worked for both firms in a material capacity. These connections are called “edges”.
3. Remove weak links. This depends on the exact network, but links should be removed if:
For any employees who have committed fraud, or employers found to commit fraud, increase the “fraud count” for any associated nodes on the network. Employee committed fraud would count towards the last employer under which the fraud was committed (or multiple, if multiple employers during the past benefit year).
Fraud has been demonstrated to circulate within geometric features in the network (small communities or cliques, for example). This allows the insurer to track which small groups of lawyers and physicians tend to be involved in more fraud, or which claimants have appeared multiple times. As cases that were never investigated cannot have fraud, this type of analysis helps uncover those rings of individuals where past behavior and association with known fraud sheds suspicion on future dealings.
Fraud for a given node can be predicted based on the fraud in the surrounding nodes (sometimes called the “ego network”). In other words, fraud tends to cluster together in certain nodes and cliques, and is not randomly distributed across the network. Communities identified through known community detection algorithms, fraud within the ego network of a node, or the shortest distance to a known fraud case are all potential predictive variables, if named information is available. Identification of these cliques or communities is highly processor intensive. Computational algorithms exist to detect connected communities of nodes in a network. These algorithms can be applied to detect specific communities. Table 14 below shows such an example, demonstrating that some identified communities have higher rates of fraud than others, solely identified by the network structure. In this case, 63 k employers were utilized to construct the total network, with millions of links between them.
An additional representation of this information is to look at the amount of fraud in “adjacent” employers and see if that predicts anything about fraud in a given employer. Thus, for each employer, an identification can be made of all employers who are “connected” by the definition given in the steps above. This makes up the “ego network” for each employer, or the ring of employers with whom the given employer has shared employees. Totaling the fraud for each employer's ego network, then grouping the employers based on the rate of fraud in the ego network, results in the finding that employers with high rates of fraud in their ego network are more likely to have high rates of fraud themselves (see Table 15 below).
At the time of an initial claim for UI insurance, the claimant must report some information, such as date of birth, age, race, education, occupation and industry. The specific elements'required differ from state to state. These data are typically used by the state for measuring and understanding employment conditions in the state. However, if the reported data from individuals are examined carefully, anomalies based on inconsistent reporting can be found, which might be suggestive of identity fraud. It is possible that a third party is using the social security number of a legitimate person to claim a benefit, but may not know all the details for that person.
Although this can be applied to many data elements, this example walks through generating these types of anomalies for individuals based on the occupation reported from year to year. This process will produce a matrix to identify outliers in reported changes in occupation:
1) Identify all claimants reporting more than one initial claim in the database.
2) For each pair of claims 1st and 2nd), identify the first reported occupation and the second reported occupation.
3) Aggregating across all claimants produces a matrix of size N×N, where N=number of occupations available in the database. The columns of the matrix should represent the 1st reported occupation, while the rows should represent the 2nd reported occupation.
4) For each column, divide each cell by the total for that column. The resulting numbers represent the probability that an individual from a given 1st occupation (column) will report another 2nd occupation the next time the individual files a claim.
Table 16 below provides an example, showing the Standard Occupation Codes (SOC). This represents the upper corner of a larger matrix. This is interpreted as follows: Applicants who file a claim and report working in a Management Occupation (SOC 11), will report the same SOC in the next claim 47% of the time, a Business and Financial Occupation (SOC 13) 8.7% of the time, and so forth. The outlier or anomaly is a claimant who reports SOC 17 in a subsequent claim as an architect. This should be flagged as an outlier.
The process for this is repeated by a computer using the 2-digit Major SOC, 3-digit SOC, 4-digit SOC, 5-digit SOC and 6-digit SOC. The computer can choose the appropriate level of information (which digit code) and the cut-off for the indicator of an anomaly. The cut-offs chosen should range from 0.05% to 5% in increments of 0.05% to identify the appropriate cut-off. The following decision process is applied by the computer:
1) For a given level of information (e.g., 2-digit SOC code):
2) Repeat across all levels of detail.
3) Choose the deepest level of detail and cut-off that meet the requirement of flagging less than 5% of claims.
This process should be repeated for data elements with reasonable expected changes, such as education or industry. Fixed or unchanging pieces of information should be assessed as well, such as race, gender, or age. For something like age, where the data element has a natural change, the expected age should be calculated using the time that has passed since the prior claim was filed to infer the individual's age.
Some industries have high levels of seasonal employment, and perform lay-offs during the off season. Examples include agriculture, fishing, and construction, where there are high levels of employment in the summer months and low levels of employment in the winter months. Another outlier or anomaly is when a claim is filed for an individual in a specific industry (or occupation) during the expected working season. These individuals may be misrepresenting their reasons for separation, and therefore committing fraud.
Seasonal industries and occupations can be identified using a computer by processing through the numerous codes to identify the codes where the aggregate number of filings is the highest. Then, individuals are flagged if they file claims during the working season for these seasonal industries. The process to identify the seasonal industries is as follows:
1) For each industry (or occupation), aggregate the number of claims by month (1-12) or week of the year (1-52)
2) Create a histogram of these claims, where the x-axis is the date from step 1 and the y axis is the count of claims during that time period
3) Any industry or occupation where the count of unemployment filings for the minimum period *10<maximum count of employment filings is considered a seasonal industry
4) Determine the seasonal period for this industry by the “elbow” or “scree point” of the distribution. This is the point where the slope of the distribution slows dramatically from steep to shallow. If such points do not exist, then choose the lowest 10% of months (or weeks) to represent the seasonal indicators
5) Any claims in the working period are anomalies.
Another type of outlier is an anomalous personal habit. Individuals tend to behave in habitual ways related to when they file the weekly certification to receive the UI benefit. Individuals typically use the same method for filing the certification (i.e., web site versus phone), tend to file on the same day of the week, and often file at the same time each day. The goal is to find applicants and specific weekly certifications where the applicant had established a pattern then broke the pattern in a material way, presenting anomalous or highly unexpected behavior.
Probabilistic behavioral models can be constructed for each unique applicant, updating each week based on that individual's behavior. These models can then be used to construct predictions for the method, day of week, or time by which/when the claimant is expected to file the weekly certification. Changes in behavior can be measured in multiple ways, such as:
1) Count of weeks where the individual files outside a specified prediction interval, such as 95%
2) Change in model parameters that measure variance in the prediction (how certain the model is that the individual will react in a specific way)
3) Probability for a filing under a specific model: P(Filing|Model)
The methods applied to identify anomalies can be the method of access, day of week of the weekly certification, and the log in time.
The method of access and day of week are both discrete variables. In this example, the method of access (MOA) can take the values {Web, Phone, Other} and the day of week (DOW) can take values {1, 2,3,4,5,6,7}. A Multinomial-Dirichlet Bayesian Conjugate Prior model can be used to model the likelihood and uncertainty that an individual will access using a specific method on a specific day. It should be understood that other discrete variables can be used.
For MOA, for example, the process will generate indicators that the applicant is behaving in an anomalous way:
1) For an individual applicant, gather and sort all weekly certifications in order of time from earliest to latest
3) Set prior:
4) Calculate prediction interval
5) Evaluate actual data and create anomaly flag if necessary
6) Update prior
7) Calculate changes in expected variable
In addition to the Method of Access and Day of Week outliers created by the process described above, anomalies and outliers can be created for the time that an applicant logs in to the system to file a weekly certification, assuming that that the time stamp is captured.
The process of utilizing a probability model, calculating the likelihood, and updating the posterior remain the same as described above, however, the distribution is different. In this case, a Normal-Gamma Conjugate Prior model is used. These steps outline the same process but instead replacing with the appropriate mathematical formulas:
1) For an individual applicant, gather and sort all weekly certifications in order of time from earliest to latest.
2) Convert the time in HH:MM:SS format to a numeric format: T=HH+MM/60+SS/602.
3) The model is that the time of log in is normally distributed: T˜Normal(μ, σ2), then the parameters are jointly distributed as a Normal-Gamma: (μ, σ−2)˜NG(μ0, κ0, α0, β0).
4) Set prior:
5) Calculate prediction interval
6) Evaluate actual data and create an anomaly flag if necessary
7) Update prior
a. Calculate the posterior parameters using the Conjugate Prior Relationship given in the following formulas, where J=1. Here, the sub-index n=1, . . . , N for each claimant.
8) Calculate changes in expected variable
Once all anomalies have been identified, these disparate indicators must be combined into an Ensemble Fraud Score. This example considers the combination of these anomaly indicators, which can take the value {0,1}. However, if the different indicators are represented by the confidence they have been violated, then they can be represented as the inverse of the confidence: 1/confidence and combined using the same process.
In constructing the Ensemble Fraud Score, linear combinations of the underlying indicators can be created: S=Σj=1JIjαj where Ij is the anomaly indicator, J is the total number of anomaly indicators to be combined, and αj are the weights. To set the weights:
1) Consider the correlation of all indicators Ij. If all pairwise correlations are less than 0.2, then set all αj=1. Otherwise, proceed to step 2.
2) If a subset of variables are inter-correlated, in other words, where a small subset of variables have correlations>0.5, then:
In the case of the Ensemble Fraud Score (S) from above, reason codes can be used to describe the reason that the individual score is obtained. In this case, the reasons are the underlying anomaly indicators Ij. If Ij=1 then the claimant has this reason. The reasons are ordered based on the size of the weights, Reasons maintained by the system for each claimant scored are passed along with the Ensemble Fraud Score.
Appendix C is a glossary of variables that can be used in UI clustering.
The second principal instantiation of the invention described herein utilizes association rules. This instantiation is next described.
Association rules can be used to quantify “normal behavior” for, for example, insurance claims, as tripwires to identify outlier claims (which do not meet these rules) to be assigned for additional investigation. Such rules assign probabilities to combinations of features on claims, and can be thought of as “if-then” statements: if a first condition is true, then one may expect additional conditions to also be present or true with a given probability. According to various exemplary embodiments of the present invention, these types of association rules can be used to identify claims that break them (activating tripwires). If a claim violates enough rules, it has a higher propensity for being fraudulent (i.e., it presents an “abnormal” profile) and should be referred for additional investigation or action.
The association rules creation process produces a list of rules. From that a critical number of such rules can be used in the association rules scoring process to be applied to future claims for fraud detection.
There are well-known and academically accepted algorithms for quantifying association rules. The Apriori Algorithm is one such algorithm that produces rules of the form: Left Hand Side (LHS) implies Right Hand Side (RHS) with an underlying Support, Confidence, and Lift. This relationship can be represented mathematically as: {LHS}=>{RHS}|(Support, Confidence, Lift). In such algorithms, support is defined as the probability of the LHS event happening: P(LHS)=Support. Confidence is defined as the conditional probability of the RHS given the LHS: P(RHS|LHS)=Confidence. The Lift is defined as the likelihood that the conditions are non-independent events: P(LHS & RHS)/[P(LHS)*P(RHS)]=Lift.
The typical use of association rules is to associate likely events together. This is often used in sales data. For example, a grocery store may notice that when a shopping basket includes butter and bread, then 90% of the time the basket also includes milk. This can be expressed as an association rule of the form {Butter=TRUE, Bread=TRUE}=>{Milk=TRUE}, where the Confidence is 90%. Exemplary embodiments of the present invention employ the underlying novel concept of inverting the rule and utilizing the logical converse of the rule to identify outliers and thus fraudulent claims. In the example above, this translates to looking for the 10% of shoppers who purchase butter and bread but not milk. That is an “abnormal” shopping profile.
As with the clustering instantiation described above, the association rules instantiation should begin with a database of raw claims information and characteristics that can be used as a training set (“claims” is understood in the broadest possible sense here, as noted above). Using such a training set, rules can be created, and then applied to new claims or transactions not included in the training set. From such a database, relevant information can be extracted that would be useful for the association rules analysis. For example, in an automobile BI context, different types and natures of injuries may be selected along with the damage done to different parts of the vehicle.
Claims that are thought to be normal are first selected for the analysis. These are claims that, for example, were not referred to an SIU or similar authority or department for additional investigation. These can be analyzed first to provide a baseline on which the rules are defined.
A binary flag for suspicious types of injuries can be generated, for example. In general, as previously discussed, suspicious types of claims include subjective and/or objectively hard to verify damages, losses or injuries. In the example of BI claims, soft tissue injuries are considered suspicious as they are more difficult to verify, as compared to a broken bone, burn, or more serious injury, which can be palpitated, seen on imaging studies, or that has otherwise easily identifiable symptoms and indicia. In the auto BI space, soft tissue claims are considered especially suspicious and it is considered common knowledge that individuals perpetrating fraud take advantage of these types of injuries (sometimes in collusion with health professionals specializing in soft tissue injury treatment) due to their lack of verifiability. This example illustrates that the inventive association rules approach can sort through even the most suspicious types of claims to determine those with the highest propensity to be fraudulent.
To generate the association rules, any predictive numeric and non-binary variables should be transformed into binary form. Then, for example, binary bins can be created based on historical cut points for the claim. These cut points can be, for example, the median numeric variables selected during the creation process. Other types of averages (i.e., mean, mode, etc.) could also be used in this algorithm, but may arrive at suboptimal cut points in some cases. The choice of the central measure should be selected such that the variable is cut as symmetrically as possible. Viewing each variable's histogram can enable determination of the correct choice. Selection of the most symmetric cut point helps ensure that arbitrary inclusion of very common variable values in rule sets is avoided as much as possible. Similarly, discrete numeric variables with fewer than ten distinct values should be treated as categorical variables to avoid the same pitfall. Such empirical binary cut points can be saved for use in the association rules scoring process.
Binary 0/1 variables are created for all categorical attributes selected during the creation process. This can be accomplished by creating one new variable for each category and setting the record level value of that variable to 1 if the claim is in the category and 0 if it is not. For instance, suppose that the categorical variable in question has values of “Yes” and “No”. Further suppose that claim 1 has a value of “Yes” and claim 2 has a value of “No”. Then, two new variables can be created with arbitrarily chosen but generally meaningful names. In this example, Categorical_Variable_Yes and Categorical_Variable_No will suffice. Since claim 1 has a value of “Yes”, Catergorical_Variable_Yes would be set to 1 and Categorical_Variable_No would be set to 0. Likewise for claim 2, Categorical_Variable_Yes would be set to 0 and Categorical_Variable_No would be set to 1. This can be continued for all categorical values and all categorical variables selected during the creation process.
Known association rules algorithms can be used to generate potential rules that will be tested against the claims and fraud determinations of those claims that were referred to the SIU. The LHS may comprise multiple conditions, although here and in the Apriori Algorithm, the RHS is generally restricted to a single feature. As an example, let LHS={fracture injury to the lower extremity=TRUE, fracture injury to the upper extremity=TRUE} and RHS={joint injury=TRUE}. Then, the Apriori Algorithm could be leveraged to estimate the Support, Confidence, and Lift of these relationships. Assuming, for example, that the Confidence of this rule is 90%, then it is known that in claims where there are fractures of the upper and lower extremities, 90% of these individuals also experience a joint injury. That is the “normal” association seen. Thus, for the purpose of fraud detection, claims with a joint injury without the implied initial conditions of fractures to the upper and/or lower extremities are being sought out. This is a violation of the rule, indicating an “abnormal” condition.
Using association rules and features of the claims related to the various types of injury and various body parts affected, multiple independent rules can be constructed with high confidence. If the set of rules covers a material proportion of the probability space of the RHS condition, then the LHS conditions provide alternate different—but nonetheless legitimate—pathways to arrive at the RHS condition. Claims that violate all of these paths are considered anomalous. It is true that any claim violating even a single rule might be submitted to SIU for further investigation. However, to avoid a high false positive rate, a higher threshold can be used. The threshold can be determined by examining the historical fraud rate and optimizing against the number of false positives that are achieved.
According to exemplary embodiments, setting the rules violation thresholds begins by evaluating the rate of fraud among all claims violating a single rule. If the rate of fraud is not better than the rate of fraud found in the set of all claims referred to SIU, then the threshold can be increased. This may be repeated, increasing the threshold until the rate of fraud detected exceeds that of all claims referred to SIU. In some cases, a single rule violation may outperform a combination of rules that are violated. In such circumstances, multiple thresholds may be used. Alternatively, the threshold level can be set to the highest value found in all possible combinations.
Once association rules have been created based on a training set, an exemplary scoring process for the association rules can be applied to new claims. Such a process is described in
The association rules generated may have the logical form IF {LHS conditions are true} THEN {RHS conditions are true with probability S}. To apply the association rules (generated at step 270 of
If a claim meets the RHS conditions for any claims, then the claims may be tested against the LHS conditions (step 170). If the claim meets the RHS and LHS conditions, then the claim is also sent through the normal claims handling process (step 180), recalling that this is appropriate because, in this example, the rules defined a “normal” claim profile.
If the claim meets the RHS conditions but does not meet the LHS conditions for a critical number of rules at step 170, which is predefined in the association rules creation process, then the claim may be routed to the SIU for further investigation (step 185). For example, assume that exemplary predefined association rules are the following:
1) {Head Injury=TRUE}=>{Neck Injury=TRUE}
2) {Joint Sprain=TRUE}=>{Neck Sprain=TRUE}
3) {Rear Bumper Vehicle Damage=TRUE}=>{Neck Sprain=TRUE}
Using this rule set, and further assuming that the critical value is violation two rules, non-“normal” claims may be identified. For example, if a claim presents a Neck Injury with no Head Injury, and a Neck Sprain without damage to the rear bumper of the vehicle, this violates the “normal” paradigm inherent in the data a sufficient number of two times, and the claim can be referred to the SIU for further investigation as having a certain likelihood of involving fraud. This illustrates the “tripwires” described above, which refers to violation of a normal profile. If enough tripwires are pulled, something is assumably not right.
Thus, to summarize, in applying the association rule set the claims are evaluated against the subsequent conditions of each rule—the RHS. Claims that satisfy the RHS are evaluated against the initial condition—the LHS. Claims that satisfy the RHS but do not satisfy the LHS of a particular rule are in violation of that rule, and are assigned for additional investigation if they meet the threshold number of total rules violated. Otherwise, the claims are allowed to follow the normal claims handling procedure.
To further illustrate these methods, next described are exemplary processes for creating association rules and, using those rules, scoring insurance claims for potential fraud. Appendix E sets forth an exemplary algorithm to find a set of association rules with which to evaluate new claims; and Appendix F sets forth an exemplary algorithm to score such claims using association rules.
As previously discussed, the goal of association rules is to create a set of tripwires to identify fraudulent claims. Thus, a pattern of normal claim behavior can be constructed based on the common associations between claim attributes. For example, as noted above, 95% of claims with a head injury also have a neck injury. Thus, if a claim presents a neck injury without a head injury, this is suspicious. Probabilistic association rules can be derived from raw claims data using a commonly known method such as, for example, the Apriori Algorithm, as noted above, or, alternatively using various other methods. Independent rules can be selected which form strong associations between claim attributes, with probabilities greater than, for example, 95%. Claims violating the rules can be deemed anomalous, and can thus be processed further or sent to the SIU for review. Two example scenarios are next presented. An automobile bodily injury claim fraud detector, and a similar approach to detect potential fraud in an unemployment insurance claim context.
Example variables (see also the list of variables in Appendix D):
The ultimate goal of the association rules is to find outlier behavior in the data. As such, true outliers should be left in the data to ensure that the rules are able to capture truly normal behavior. Removing true outliers may cause combinations of values to appear more prevalent than represented by the raw data. Data entry errors, missing values, or other types of outliers that are not natural to the data should be imputed. There are many methods of imputation discussed broadly in the literature. A few options are discussed below, but the method of imputation depends on the type of “missingness”, type of variable under consideration, amount of “missingness”, and to some extent user preference.
For continuous variables without good proxy estimators, and with only a few values missing, mean value imputation works well. Given that the goal of the rules is to define normal soft tissue injury claims, a threshold of 5% missing values, or the rate of fraud in the overall population (whichever is lower) should be used. Mean imputation of more than this amount may result in an artificial and biased selection of rules containing the mean value of a variable since the mean value would appear more frequently after imputation than it might appear if the true value were in the data.
If the historical record is at least partially complete, and the variable has a natural relationship to prior values then a last value imputed forward method can be used. Vehicle age is a good example of this type of variable. If the historical record is also missing, but a good single proxy estimator is available, the proxy should be used to impute the missing values. For instance, if age is entirely missing a variable such as driving experience could be used as a proxy estimator. If the number of missing values is greater than the threshold discussed above and there is no obvious single proxy estimator, then methods such as multiple imputation (MI) may be used.
Categorical variables may be imputed using methods such as last value carried forward if the historical record is at least partially complete and the value of the variable is not expected to change over time. Gender is a good example of such a variable. Other methods, such as MI, should be used if the number of missing values is less than a threshold amount, as discussed above, and good proxy estimators do not exist. Where good proxy estimators do exist they should be used instead. As with continuous variables, other methods of imputation, such as, for example, logistic regression or MI should be used in the absence of a single proxy estimator and when the number is missing values is more than the acceptable threshold.
As noted above, soft tissue injuries include sprains, strains, neck and trunk injuries, and joint injuries. They do not include lacerations, broken bones, burns, or death (i.e. items which are impossible to fake). If a soft tissue injury occurs in conjunction with one of these, set the flag to 0. For instance, if an individual was burned and also had a sprained neck, the soft tissue injury flag would be set to 0. The theory being that most people who were actually burned would not go through the trouble of adding a false sprained neck. Items included in the soft tissue injury assessment must occur in isolation for the flag to be set to 1.
Discrete numeric variables with five or fewer distinct values are not continuous and should be treated as categorical variables. Numeric variables must be discretized to use any association rules algorithm since these algorithms are designed with categorical variables in mind. Failing to bin the variables can result in the algorithm selecting each discrete value as a single category—thus rendering most numeric variables useless in generating rules. For instance, suppose damage amount is a variable under consideration and the claims under consideration have amounts with dollars and cents included. It is likely that a high number of claims 98% or better) will have unique values for this variable. As such, each individual value of the variable will have very low frequency on the dataset, making every instance appear as an anomaly. Since the goal is to find non-anomalous combinations to describe a “normal” profile, these values will not appear in any rules selected rendering the variable useless for rules generation.
Generally, 2 to 6 bins performs best, but the number of bins is dependent on the quality of the rules generated and existing patterns in the data. Too few bins may result in a very high frequency variable which performs poorly at segmenting the population into normal and anomalous groups. Too many bins will create low support rules which may result in poor performing rules or may require many more combination of rules making the selection of the final set of rules much more complex.
The operative algorithm automates the binning process with input from the user to set the maximum number of bins and a threshold for selecting the best bins based on the difference between the bin with the maximum percentage of records (claims) and the bin with the minimum percentage of records (claims). Selecting the threshold value for binning is accomplished by first setting a threshold value of 0 and allowing the algorithm to find the best set of bins. As discussed above, rules are created and the variables are evaluated to determine if there are too many or too few bins. If there are too many bins, the threshold limit can be increased, and vice-versa for too few bins.
1. Less than 45.5 days
2.45.5 days
3. More than 45.5 days
In general, bins should be of equal width (as to number of records in each) to promote inclusion of each bin in the rules generation process. For example, if a set of four bins were created so that the first bin contained 1% of the population, the second contained 5%, the third contained 24%, and the fourth contained the remaining 70%, the fourth bin would appear in most or every rule selected. The third bin may appear in a few rules selected and the first and second bins would likely not appear in any rules. If this type of pattern appears naturally in the data (as in the graphs above), the bins should be formed to include as equal a percentage of claims in each bucket as possible. In this example, two bins would be produced—a first one combining the first three bins, with 30% of the claims, and a second bin, being the fourth bin, with 70% of the claims.
Creating binary bins has the advantage of increasing the probability that each variable will be included in at least one rule, but reduces the amount of information available. Thus, this technique should only be used when a particular variable is not found in any selected rules but is believed to be important in distinguishing normal claims from abnormal claims.
Binary bins can be created using either the median, mode, or mean of the numeric variable. Generally, the median is preferred; however, the choice of the central measure should be selected such that the variable is cut as symmetrically as possible. Viewing each variable's histogram will aid determination of the correct choice.
For example,
Depending on the algorithm employed to create rules, categorical variables may need to be split into 0/1 binary variables. For instance, the variable gender would be split into two variables male and female. If gender=‘male’ then the male variable would be set to 1 and female would be set to 0, and vice versa for a value of ‘female’. Other common categorical variables (and their values) may include:
The following algorithm (see also
a-14d show the results of applying the algorithm to the applicant's age with a maximum of 6 bins and threshold values of 0.0 and 0.10, respectively. With a threshold of 0, 4 bins are selected with a slight height difference between the first bin and the other two bins. With a threshold of 0.10 (bins are allowed to differ more widely) 6 bins are selected and the variation is larger between the first two bins and the last four bins.
An initial set of variables to consider for association rules creation is developed to ensure that variables known to associate with fraudulent claims are entered into the list. The variable list is generally enhanced by adding macro-economic and other indicators associated with the claimant or policy state or MSA (Metropolitan Statistical Area). Additionally, synthetic variables such as date lags between the accident date and when an attorney is hired or distance measures between the accident site and the claimant's home address are also often included. Synthetic variables, properly chosen, are often very predictive. As noted above, the creation of synthetic variables can be automated in exemplary embodiments of the present invention
Highly correlated variables should not be used as they will create redundant but not more informative rules. For example an indicator variable for upper body joint and lower body joint sprains should be chosen rather than a generic joint sprain variable. Most variables from this initial list are then naturally selected as part of the association rules development. Many variables which do not appear in the LHS given the selected support and confidence levels are eliminated from consideration. However, it is possible that some variables which do not appear in rules initially may become part of the LHS if highly frequent variables which add little information are removed.
Variables with high frequency values may result in poor performing “normal” rules. For example, the most soft tissue injuries are to the neck and trunk. A rule describing the normal soft tissue injury claim would indicate that a neck and trunk injury is normal if a variable indicating this were used. However, this rule may not perform well as it would indicate that any joint injury is anomalous. However, individuals with joint injuries may not commit fraud at higher rates. Thus, the rule would not segment the population into high fraud and low fraud groups. When this occurs, the variable should be eliminated from the rules generation process.
As shown in Table 17, spinal sprains occur in all rules in which the RHS is a neck and trunk injury. This is a somewhat uninformative and expected result. Removing the variable from consideration may allow other information to become apparent in the rules, thus providing better insight into normal injury and behavior combinations. Table 18 below shows a sample of rules with support and confidence in the same range, but with more informative information.
Normal Profile:
The goal of the association rule scoring process is to find claims that are abnormal, by seeing which of the “normal” rules are not satisfied (i.e., the tripwires having been “tripped”). However, association rules are geared to finding highly frequent item sets rather than anomalous combinations of items. Thus, rules are generated to define normal and any claim not fitting these rules is deemed abnormal. Accordingly, as noted, rules generation is accomplished using only data defining the normal claim. If the data contains a flag identifying cases adjudicated as fraudulent, those claims should be removed from the data prior to creation of association rules since these claims are anomalous by default, and not descriptive of the “normal” profile. Rules can then be created, for example, using the data which do not include previously identified fraudulent claims.
Abnormal or Fraudulent Profile:
Optionally, additional rules may be created using only the claims previously identified as fraudulent and selecting only those rules which contain the fraud indicator on the RHS. In practice, the results of this approach are limited when used independently. However, combining rules which identify fraud on the RHS with rules that identify normal soft tissue injuries may improve predictive power. This is accomplished by running all claims through the normal rules and flagging any claims which do not meet the LHS condition but satisfy the RHS condition. These abnormal claims can then, for example, be processed through the fraud rules, and claims meeting the LHS condition are flagged for further investigation. Examples of these types of rules are shown in Table 19 below.
Note that these anomalous rules have a very low support (the probability of the LHS event even happening is low) but high confidence (if and when the LHS event does occur, the RHS event almost always occurs). Thus, the LHS occurs very infrequently when a soft tissue injury is indicated.
As previously noted, there are multiple algorithms for quantifying association rules. The Apriori Algorithm, frequent item sets, predictive Apriori, teritus, and generalized sequential pattern generation algorithms, for example, all produce rules of the form: LHS implies RHS with underlying Support and Confidence. Again, support is the probability of the LHS event happening: P(LHS)=Support; confidence is the conditional probability of the RHS given the LHS: P(RHS|LHS)=Confidence.
For example, let LHS={fracture injury to the lower extremity=TRUE, fracture injury to the upper extremity=TRUE} and RHS={joint injury=TRUE}. Fractures are less common events in auto BI claims and fractures to both upper and lower extremities are rare. Thus the support of this rule might be only 3%. However, when fractures of both upper and lower extremities exist, other joint injuries are commonly found. The Confidence of this rule might be 90%. This indicates that in claims where there are fractures of the upper and lower extremities, 90% of these individuals also experience a joint injury. The probability of the full event would be 2.7%. That is, 2.7% of all BI claims would fit this rule.
Most association rules algorithms require a support threshold to prune the vast number of rules created during processing. A low support threshold (˜5%) would create millions or even tens of millions of rules making the evaluation process difficult or impossible to accomplish. As such, a higher threshold should be selected. This can be done incrementally, for example, by choosing an initial support value of 90% and increasing or decreasing the threshold until a manageable number of rules is produced. Generally 1,000 rules is a good upper bound, but that may be increased as computing power, RAM and computing speed all increase. The confidence level can—for example, further reduce the number of rules to be evaluated.
In auto BI claims, fraud tends to happen in claims where there are injuries to the neck and/or back, as these are easier to fake than fractures or more serious injuries. This is a particular instance of the general source of fraud, which is subjective self-reported bases for a monetary or other benefit, where such bases are hard or impossible to independently verify. Using association rules and features of the claims related to the types of injury and body part affected, multiple independent rules with high support and confidence can be constructed. The goal is to find rules that describe “normal” BI claims containing only soft tissue injuries. What is desired are rules of the form LHS=>{soft tissue injury} in which the rules are of high Confidence. If the RHS is present without the LHS, a violation of the rule occurs. Support is used to reduce the number of rules to the least possible number needed to produce the highest rate of true positives and lowest rate of false negatives when compared against the fraud indicator. Table 20 below sets forth examplary output of an association rules algorithm with various metrics displayed.
The first three would be kept in this example since they have high confidence and high support. This indicates that the claim elements in the LHS occur quite frequently (are normal) and that when they occur there are often soft tissue injuries. Thus, these describe normal soft tissue injuries. The next three rules have high confidence, but low support. These are abnormal soft tissue injuries. These may be considered for a secondary set of anomalous rules, as described above in connection with
To evaluate individual rules one can, for example, first subset the data into those claims that satisfy the RHS condition (they are soft tissue injuries). Then, find all claims that violate the LHS condition and compare the rate of fraud for this subpopulation to the overall rate of fraud in the entire population. Keep the LHS if the rule segments the data such that cases satisfying the LHS have a higher rate of fraud than the overall population. Eliminate rules that have the same or a lower rate of fraud compared to the overall population.
Normal rules can then, for example, be tested on the full dataset. Table 21 above depicts the outcome of a particular rule (columns add to 100%). Note that the fraud rate for the population meeting the rule (Normal=Yes) is 6% compared to the fraud rate for the population which does not meet the rule at 8%. This indicates a well performing rule which should be kept. When evaluating individual rules, the threshold for keeping a rule should be set low. Generally, for example, if there is improvement in the first decimal place, the rule should be initially kept. A secondary evaluation using combinations of rules will further reduce the number of rules in the final rule set.
Once all LHS conditions are tested and the set of LHS rules to keep are determined, test the combined LHS rules against those cases which meet the RHS condition. If the overall rate of fraud is higher than the rate of fraud in the full population, then the set of rules performs well. Given that each rule individually performs well, the combined set generally performs well. However, combining all LHS rules may also eliminate truly fraudulent cases resulting in a large number of false negatives. Thus, different combinations of rules must be tested to find those combinations which result in low false negative values and high rates of fraud.
Note the behavior of rules violated versus the SIU referral rate in Table 22 above. As more rules are violated fewer of the resulting claims in the subpopulation were historically selected for investigation, but the subpopulation has a much higher rate of fraud. This is the desired behavior as it indicates that the rules are uncovering potentially previously unknown fraud. Table 22 illustrates how the number of claims identified as known fraud and the expected numbers of claims with previously unknown fraud change as multiple rules can be combined. Applying only the first rule yields a known fraud rate of 55% and an expected 903 claims with previously unknown fraud. At first this may seem very good and that perhaps only the first rule should be applied. However, the lower known fraud rate gives less confidence about the actual level of fraud in the expected fraudulent claims. There is less confidence that all 903 claims will in fact be fraudulent. Combining the first two rules does not improve this appreciably giving further evidence that more rules are needed. The jump to 75% known fraud after adding in the third rule provides much more confidence that the 155 suspected fraudulent claims will contain a very high rate of fraud. Including the fourth rule does not improve the known fraud rate but significantly reduces the number of potentially fraudulent claims from 155 to 26. Thus, for example, applying the first three rules in combination provides the best solution. The fourth rule is not thrown out immediately as it may combine well with other rules. If after checking all combinations, the fourth rule performs as it does in this example, then it would be eliminated.
The ultimate set of rule combinations results in the confusion matrix depicted in Table 23 below, which exhibits a good predictive capability. Note that the 6% of claims predicted to be fraudulent, but not currently flagged as fraudulent, are the expected claims containing unknown currently undetected fraud. These claims are not considered false positives. Also note that the false negative rate is very low at 1%. Therefore the overall combination of rules performs well. The final list of exemplary rules is provided below.
Exemplary Algorithm for Exhaustively Testing Rules for Inclusion (see also
Table 24 below lists the final rules produced is this example.
As noted above, once a set of association rules has been generated form a sample set of claims (training set) it can then, in exemplary embodiments, be used to score new claims. The following describes scoring of claims for the exemplary Auto BI example described above.
This can be essentially the same as set forth above in connection with the auto BI clustering example.
For a claim coming into the system, the values of each of the 128 variables can be populated and then standardized, as noted above. In exemplary embodiments, this may be done through the following process:
Impute Missing Values:
a. If the variable value is not present for a given claim, the value must be imputed based on the Missing Value Imputation Instructions provided. This must be replicated for each variable to ensure values are provided for each variable for a given claim.
b. For example, if a claim does not have a value for the variable ACCOPENLAG (lag in days between the accident date and the BI line open date) is not present, and the instructions require using a value of 5 days, then the value of this variable for the claim can be set to 5.
Variable Split Definitions:
Each of the 128 predictive variables can be transformed into a binary flag. This may be accomplished by utilizing the Variable Split Definitions from the Seed Data. These split definitions are rules of the form IF-THEN-ELSE that split each numeric variable into a binary flag. For example:
Categorical variables not coded as 0/1 can be split into 0/1 binary variables. For example acc_day (the day of the week the accident takes place) consists of the values 1-7. Each value would become its own variable and would have the value 1 if the original variable corresponds, and 0 otherwise. For example, a variable acc_day—3 might be created and acc_day—3=1 when acc_day=3 and acc_day—3=0 otherwise.
The following variables can benefit from this process:
The association rules scoring process in this example is focused on claims with a soft tissue injury, such as a back injury, for the reasons described above. Thus, the first step in the scoring process is to select only those claims which have a soft tissue injury. If there is no soft tissue injury, these claims are not flagged for referral to the SIU in the same way.
If the claim involves a claimant with a soft tissue injury, then the following process can, for example, be used to forward claims to the SIU:
A series of rules are generated using the Seed Data (see, e.g., Table 26). These rules are of the form: {LHS Condition}=>{RHS Condition}. First, all claims are evaluated against the LHS conditions on the rules. If a claim does not meet any of the LHS conditions, then it is not forwarded on to the SIU. If it meets any of the LHS conditions for any of the rules, then proceed to the next step.
For example, a rule might be: {Claimant Rear Bumper Damage, Insured Front End Damage}=>{Neck Injury}. A claim flagged by this rule is flagged because it has both rear bumper damage for the claimant and front end damage for the insured (i.e., the insured vehicle rear-ended the claimant vehicle).
In exemplary embodiments, for each claim, the appropriate RHS conditions can be evaluated that correspond to the LHS conditions which flagged each claim. In the example from the prior section, the claim involves rear bumper damage to the claimant and front end damage to the insured. Then, the claim is compared against the right hand side of the rule: Does the claim also have a Neck Injury?
If there is no neck injury, then the claim has violated a rule. The count of all violations can then be summed over all rules that apply to each claim.
Select Claims that Fail to Trigger a Critical Number of RHS:
Once all rules have been evaluated against the claims, then the claims which have a violation count larger than the critical number can be forwarded to the SIU. The critical number can be set based on the training set data. In this example, the critical number is 4. Claims with 4 or more violations will be forwarded to the SIU for further investigation.
There are potential exceptions to the rule for forwarding claims to the STU. These business rules would be customized to a particular user's individual claims department, for example, but all exceptions would keep a claim from being forwarded to the SIU. For example, as already noted above, if the claim involves death, do not forward the claim to the SIU.
Next described is an exemplary process of creating association rules for fraud detection in Unemployment Insurance (UI) claims. The goal of the association rules is to create a set of tripwires to identify fraudulent claims. A pattern of normal claim behavior is constructed based on the common associations between the claim attributes. For example, 75% of claims from blue collar workers are filed in the late fall and winter. Probabilistic association rules are derived on the raw claims data using a commonly known method such as the frequent item sets algorithm (other methods would also work). Independent rules are selected which form strong associations between attributes on the application, with probabilities greater than 95%, for example. Applications violating the rules are deemed anomalous and are process further or sent to the SIU for review.
Example Variables:
The ultimate goal of the association rules is to find outlier behavior in the data. As such, true outliers should be left in the data to ensure that the rules are able to capture normal behavior. Thus, removing true outliers may cause combinations of values to appear more prevalent than represented by the raw data. Data entry errors, missing values, or other types of outliers that are not natural to the data should be imputed. There are many methods of imputation available, but the method of imputation depends on the type of “missingness”, type of variable under consideration, amount of “missingness”, and to some extent user preference.
The following discussion is similar to that presented above for the Auto BI example. It is repeated here for ready reference.
For continuous variables without good proxy estimators and with few values missing, mean value imputation works well. Given that the goal of the rules being developed is to define normal UI claims, a threshold of 5% or the rate of fraud in the overall population (whichever is lower) should be used. Mean imputation of more than this amount may result in an artificial and biased selection of rules containing the mean value of a variable since the mean value would appear more frequently after imputation than it might appear if the true value were in the data.
If the historical record is at least partially complete and the variable has a natural relationship to prior values then last value imputed forward can be used. Applicant age and gender are good examples of this type of variable. If the historical record is also missing, but a good single proxy estimator is available, the proxy should be used to impute the missing values. For instance, if Maximum Eligible Benefit Amount is entirely missing a variable such as SOC could be used to develop an estimate. If the number of missing values is greater than the threshold discussed above and there is no obvious single proxy estimator, then methods such as MI should be used.
Categorical variables may be imputed using methods such as last value carried forward if the historical record is at least partially complete and the value of the variable is not expected to change over time. Gender is a good example. Other methods such as MI should be used if the number of missing values is less than a threshold amount as discussed above and good proxy estimators do not exist. Where good proxy estimators do exist they should be used instead. As with continuous variables, other methods of imputation such as logistic regression or MI should be used in the absence of a single proxy estimator and when the number is missing values is more than the acceptable threshold.
The RHS can be determined entirely by the association rules algorithm or a common RHS may be selected to generate rules which have more meaning and provide an organized series of rules for scoring. In this example, a grouping of the SOC industry codes was used.
Discrete numeric variables with five or fewer distinct values are not continuous and should be treated as categorical variables. Numeric variables must be discretized to use any association rules algorithm since these algorithms are designed with categorical variables in mind. Failing to bin the numeric variables will result in the algorithm selecting each discrete value as a single category rendering most numeric variables useless in generating rules. For instance, suppose eligibility amount is a variable under consideration and the claims under consideration have amounts with dollars and cents included. It is likely, that a high number of claims 98% or better) will have unique values for this variable. As such, each individual value of the variable will have very low frequency on the dataset making every instance an anomaly. Since the goal is to find non-anomalous combinations, these values will not appear in any rules selected rendering the variable useless for rules generation.
Generally, 2 to 6 bins performs best, but the number of bins is dependent on the quality of the rules generated and existing patterns in the data. Too few bins may result in a very high frequency variable which performs poorly at segmenting the population into normal and anomalous groups. Too many bins (as in the extreme example above) will create low support rules which may result in poor performing rules or may require many more combination of rules making the selection of the final set of rules much more complex.
The algorithm below automates the binning process with input from the user to set the maximum number of bins and a threshold for selecting the best bins based on the difference between the bin with the maximum percentage of records and the bin with the minimum percentage of records. Selecting the threshold value for binning is accomplished by first setting a threshold value of 0 and allowing the algorithm to find the best set of bins. As discussed above, rules are created and the variables are evaluated to determine if there are too many or too few bins. If there are too many bins, the threshold limit can be increased and vice versa for too few bins.
Because there are multiple RHS components representing different industries and different industries likely have unique distributions of variables, binning must be accomplished for each RHS independently. The graph depicted in
Bins should be of equal height to promote inclusion of each bin in the rules generation process. For example, if a set of four bins were created so that the first bin contained 1% of the population, the second contained 5%, the third contained 24%, and the fourth contained the remaining 70%, the fourth bin would appear in most or every rule selected. The third bin may appear in a few rules selected and the first and second bins would likely not appear in any rules. If this type of pattern appears naturally in the data (as in the graphs above), the bins should be formed to include as equal a percentage of claims in each bucket as possible. In this example, two bins would be produced with 30% and 70% of the claims in each bin respectively.
Creating binary bins has the advantage of increasing the probability that each variable will be included in at least one rule, but reduces the amount of information available. Thus, this technique should only be used when a particular variable is not found in any selected rules but is believed to be important in distinguishing normal claims from abnormal claims.
Binary bins are created using either the median, mode, or mean of the numeric variable. Generally, the median works best. However, the choice of the central measure should be selected such that the variable is cut as symmetrically as possible. Viewing each variable's histogram will aid determination of the correct choice.
a graphically shows the number of previous employers for blue collar applicants.
Depending on the algorithm deployed to create rules, categorical variables may need to be split into 0-1 binary variables. For instance, the variable gender would be split into two variables male and female. If gender=‘male’ then the male variable would be set to 1 and it would be set to 0 otherwise and vice versa for the female variable. Other common categorical variables include:
The following algorithm (see also
a-14d (which can be applicable to both auto BI and UI claims) show the results of applying the algorithm to the applicant's age with a maximum of 6 bins and threshold values of 0.0 and 0.10, respectively. With a threshold of 0, 4 bins are selected with a slight height difference between the first bin and the other two bins. With a threshold of 0.10 (bins are allowed to differ more widely) 6 bins are selected and the variation is larger between the first two bins and the last four bins.
An initial set of variables to consider for association rules creation is developed to ensure that variables known to associate with fraudulent claims are entered into the list. The variable list is generally enhanced by adding macro-economic and other indicators associated with the applicant, state, or MSA. Additionally, synthetic variables such as the time between the current application and the last filed application or the total number of past accounts and average total payments from previous accounts.
Highly correlated variables should not be used as they will create redundant but not more informative rules. For example, the weekly benefit amount and the maximum benefit amount are functionally related. Having both of the variables on the data set would likely result in one of them on the LHS and the other on the RHS, but this relationship is known and not informative. Most variables from this initial list are then naturally selected as part of the association rules development. Many variables which do not appear in the LHS given the selected support and confidence levels are eliminated from consideration. However, it is possible that some variables which do not appear in rules initially may become part of the LHS if highly frequent variables which add little information are removed.
Variables with high frequency values may result in poor performing “normal” rules. For example, the construction industry is largely dominated by male workers. A rule describing the normal UI application for this industry would indicate that being male is normal if a variable indicating gender were used. However, this rule may not perform well as it would indicate that any female applicant is anomalous. However, females may not commit fraud at higher rates than males. Thus, the rule would not segment the population into high fraud and low fraud groups. When this occurs, the variable should be eliminated from the rules generation process.
In Table 27 above, MAX_ELIG_WBAAMT=<292.5 as the RHS with every LHS containing MBA_ELIG_AMT_LIFE=<7605.0. This result is not informative since the RHS is just a multiple of the LHS. Further, the RHS is largely dependent on the industry (Health Care in this case). Thus, other LHS components are also less informative in combination with MAX_ELIG_WBA_AMT on the RHS. Removing both variables would allow other LHS components to enter consideration and promote the Health Care industry NAICS Descriptions on the RHS. Table 28 below shows a sample of rules with support and confidence in the same range, but with more informative information.
As noted above repeatedly, the goal of the association rules scoring process is to find claims which are abnormal. However, association rules are geared to finding highly frequent items sets rather than anomalous combinations of items. Thus, rules are generated to define normal and any claim not fitting these rules is deemed abnormal. Accordingly, rules generation is accomplished using only data defining the normal claim. If the data contains a flag identifying cases adjudicated as fraudulent, those claims should be removed from the data prior to creation of association rules since these claims are anomalous by default. Rules are then created using the data which do not include previously identified fraudulent claims.
Optionally, additional rules may be created using only the claims previously identified as fraudulent and selecting only those rules which contain the fraud indicator on the RHS. In practice, the results of this approach are limited when used independently. However, combining rules which identify fraud on the RHS with rules that identify normal UI claims may improve predictive power. This is accomplished by running all claims through the normal rules and flagging any claims which do not meet the LHS condition but satisfy the RHS condition. These abnormal claims are then processed through the fraud rules and claims meeting the LHS condition are flagged for further investigation. Examples of these types of rules are shown in Table 29 below.
It is noted that these anomalous rules have a very low support but high confidence. Thus, having a master's degree is not common among all industries, but when it does occur, there is a 98% probability that the applicant works in a White Collar industry.
Use of both normal and anomalous rules is described above in connection with
As previously discussed, the algorithms for quantifying association rules produce rules of the form: LHS implies RHS with underlying Support and Confidence (Support being the probability of the LHS event happening: P(LHS)=Support; Confidence being the conditional probability of the RHS given the LHS: P(RHS|LHS)=Confidence).
For example, let LHS={Age between 28 and 40, Bachelor's Degree=True} and RHS={White Collar Worker}. Bachelor's degrees are somewhat uncommon in general and are less common in the 28 to 40 age bracket. Thus the support of this is only 8%. However, when among white collar workers aged 28 to 40 having a bachelor's degree is quite common with a confidence of 97%. This tells us that 97% of white collar applicants aged 28 to 40 have bachelor's degrees. The probability of the full event would be 7.8%. That is, 7.8% of all applications would fit this rule.
Most association rules algorithms require a support threshold to prune the vast number of rules created during processing. A low support threshold (˜5%) would create millions or even tens of millions of rules making the evaluation process difficult or impossible to accomplish. As such, a higher threshold should be selected. This can be done incrementally by choosing an initial support value of 90% and increasing or decreasing the threshold until a manageable number of rules is produced. Generally 1,000 rules is a good upper bound. The confidence level will further reduce the number of rules to be evaluated.
Using association rules and features of the application related to the applicant's industry, we construct multiple independent rules with high support and confidence. The goal is to find rules which describe “normal” applications within a particular industry. What is desired are rules of the form LHS=>{industry} in which the rules are of high Confidence. Support is used to reduce the number of rules to the least possible number needed to produce the highest rate of true positives and lowest rate of false negatives when compared against the fraud indicator. Table 30 below sets forth example output of an association rules algorithm with various metrics displayed.
The first three would be kept in this example since they have high confidence and high support. This indicates that the applications elements in the LHS occur quite frequently (are normal) and that when they occur they are often found in within the Production Occupations. Thus, these describe normal Production Occupation applications. The next two rules have high confidence, but low support. These are abnormal Production Occupation applications. These may be considered for a secondary set of anomalous rules. The last two rules have lower support and confidence and should be removed altogether.
To evaluate individual rules first subset the data into those claims which satisfy the RHS condition (they are soft tissue injuries); then, find all claims that violate the LHS condition and compare the rate of fraud for this subpopulation to the overall rate of fraud in the entire population. Keep the LHS if the rule segments the data such that cases satisfying the LHS have a higher rate of fraud than the overall population. Eliminate rules which have the same or a lower rate of fraud compared to the overall population.
Normal rules are tested on the full dataset. Table 31 above depicts the outcome of a particular rule (columns add to 100%). Note that the fraud rate for the population meeting the rule (Normal=Yes) is 5.2% compared to the fraud rate for the population which does not meet the rule at 8.7%. This indicates a well performing rule which should be kept. When evaluating individual rules, the threshold for keeping a rule should be set low. Generally, if there is improvement in the first decimal place, the rule should be initially kept. A secondary evaluation using combinations of rules will further reduce the number of rules in the final rule set.
Once all LHS conditions are tested and the set of LHS rules to keep are determined, test the combined LHS rules against those cases which meet the RHS condition. If the overall rate of fraud is higher than the rate of fraud in the full population, then the set of rules performs well. Given that each rule individually performs well, the combined set generally performs well. However, combining all LHS rules may also eliminate truly fraudulent cases resulting in a large number of false negatives. If this occurs, test combinations of rules beginning with the best performing rule and adding on the next best rule iteratively. Exhaustively test all rules combinations until the set with the highest true positive and true negative rate is found. The ultimate set of rules results in confusion matrix depicted below which exhibits a good predictive capability:
The best performing set of “normal” rules may still allow a high false positive rate. In this case the secondary set of anomalous rules described above may improve performance. In Table 32 above, applications that fail the “normal” rules exhibit a fraud rate of 6.8% compared to the overall rate of 4.6%. After applying the anomaly rules to the subset of applications failing the normal rules, the fraud rate of the resulting population increases to 7.8%. Thus, applying the second set of rules produces a better outcome.
Algorithm for Exhaustively Testing Rules for Inclusion (see also
Table 33 below lists the final set of “normal” UI association rules produced:
Table 34 below lists the final set of “anomalous” rules produced:
Scoring of UI Claims Using. Generated UI Association Rules:
Scoring of UI claims would proceed in similar fashion as described above for scoring Auto BI claims. To lessen the burden on the reader, that material will not be repeated herein, to avoid redundancy.
It should be appreciated that the inventive models described herein can be periodically re-calibrated so that rules/insights/indicators/patterns/predictive variables/etc. gleaned from previous applications of the unsupervised analytical methods (including the results of associated SIU investigations) can be fed back as inputs to inform/improve/tweak the fraud detection process.
Indeed, periodically, the clusters and rules should be recalibrated and/or new clusters and rules created in order to identify emerging fraud and ensure that the rules scoring engine remains efficient and accurate. Fraud perpetrators often invent new and innovative schemes as their earlier methods become known and recognized by authorities. The inventive unsupervised analytical methods are uniquely postured to capture patterns that may indicate fraud, without knowing what the precise scheme is. An exemplary system for accomplishing this recalibration task is depicted, for example, in
In addition, a current scoring engine may be monitored with feedback from the SIU and standard claims processing to determine which rules and clusters are detecting fraud most efficiently. This efficiency can be measured in two ways. First, the scoring engine should find a high level of known fraud schemes and previously undetected schemes. Second, the incidence of actual fraud found in claims sent for further investigation should be at least as high, if not higher, than historical rates of fraud detected. The first condition ensures that fraud does not go undetected, and the second condition ensures that the rate of false positives is minimized. Association rules generating many false positives can be modified or eliminated, and new clusters can be created to better identify known fraud patterns. In this way, the scoring engine can be constantly monitored and optimized to create an efficient scoring process.
An example of this type of update for an auto BI claims rule might occur if a rule stating that when the respective accident and claimant addresses are within 2 miles of one another, an attorney is hired within 21 days of the accident, the primary insured's vehicle is less than six years old and the claimant had only a single part damaged, then the claim is likely to be fraudulent. However, upon investigation it may be discovered that when the attorney is hired beyond 45 days after the accident, with the remainder of the rule unchanged, there is a greater likelihood of fraud. In such case, the rule can be adjusted to produce better results. As noted, rules and clustering should be updated periodically to capture potentially fraudulent claims as fraudsters continue to create new as yet undiscovered schemes.
It will be appreciated that, with the inventive embodiments, insights/indicators surface automatically from the unsupervised analytical methods. While plenty of “red flags” that are tribal wisdom or common knowledge also surface, the inventive embodiments can also turn out insights/indicators that are more in-depth or dive deeper and with greater complexity and/or are counterintuitive.
By way of example, the clustering process generates clusters of claims with a high number of known red flags combined with other information not previously known. It is known, for example, that when attorneys show up late in the process, or, for example, the claim is just under threshold values, the claim is often fraudulent. As expected, these indexes fall into clusters of claims with high fraud rates. However, the clustering process also finds that these suspicious claims are separated into two groups, with some claims ending up in one cluster and the remaining claims in another cluster, once other variables are considered beyond attorney involvement. In auto BI, for example, when multiple parts of the vehicle are damaged, these claims end up in a different cluster. The additional information spotlights claims that have a higher likelihood of fraud than claims with the original known red flags but not the added information.
Further, suppose when claims are clustered one of the clusters turns out to have many red flags (e.g., attorney shows up late in the process, smaller claim to avoid notice, etc.). Although the claims adjusters may know that some of these things are bad signals, the inventive approach would identify claims with these traits that were not sent to the SIU. The unsupervised analytics would identify that which was supposedly “already known” but not being followed everywhere.
The association rules analysis “finds” associations that make intuitive sense (e.g., side swipe collisions and neck injuries). Although the experienced investigator may know this rule, the unsupervised analytics turns out these other types of rules as well, including ones that were not previously known. Advantageously, the expert does not need to know all the rules beforehand. By way of an example, suppose that:
It should be understood that the modules, processes, systems, and features described hereinabove can be implemented in hardware, hardware programmed by software, software instructions stored on a non-transitory computer readable medium or a combination of the above. Embodiments of the present invention can be implemented, for example, using a processor configured to execute a sequence of programmed instructions stored on a non-transitory computer readable medium. The processor can include, without limitation, a personal computer or workstation or other such computing system or device that includes a processor, microprocessor, microcontroller device, or is comprised of control logic including integrated circuits such as, for example, an Application Specific Integrated Circuit (ASIC). The instructions can be compiled from source code instructions provided in accordance with a suitable programming language. The instructions can also comprise code and data objects provided in accordance with a suitable structured or object-oriented programming language. The sequence of programmed instructions and data associated therewith can be stored in a non-transitory computer-readable medium such as a computer memory or storage device, which may be any suitable memory apparatus, such as, but not limited to ROM, PROM, EEPROM, RAM, flash memory, disk drive and the like.
Furthermore, the modules, processes, systems, and features can be implemented as a single processor or as a distributed processor. Further, it should be appreciated that the process steps described herein may be performed on a single or distributed processor (single and/or multicore). Also, the processes, system components, modules, and sub-modules for the inventive embodiments may be distributed across multiple computers or systems or may be co-located in a single processor or system.
The modules, processors or systems can be implemented as a programmed general purpose computer, an electronic device programmed with microcode, a hard-wired analog logic circuit, software stored on a computer-readable medium or signal, an optical computing device, a networked system of electronic and/or optical devices, a special purpose computing device, an integrated circuit device, a semiconductor chip, and a software module or object stored on a computer-readable medium or signal, for example. Indeed, the inventive embodiments may be implemented on a general-purpose computer, a special-purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit element, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmed logic circuit such as a PLD, PLA, FPGA, PAL, or the like. In general, any processor capable of implementing the functions or steps described herein can be used to implement embodiments of the method, system, or a computer program product (software program stored on a non-transitory computer readable medium).
Additionally, in some exemplary embodiments, distributed processing can be used to implement some or all of the disclosed methods, where multiple processors, clusters of processors, or the like are used to perform portions of various disclosed methods in concert, sharing data, intermediate results and output as may be appropriate.
Furthermore, embodiments of the disclosed method, system, and computer program product may be readily implemented, fully or partially, in software using, for example, object or object-oriented software development environments that provide portable source code that can be used on a variety of computer platforms. Alternatively, embodiments of the disclosed method, system, and computer program product can be implemented partially or fully in hardware using, for example, standard logic circuits or a VLSI design. Other hardware or software can be used to implement embodiments depending on the speed and/or efficiency requirements of the systems, the particular function, and/or particular software or hardware system, microprocessor, or microcomputer being utilized. Embodiments of the method, system, and computer program product can be implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the description provided herein and with a general basic knowledge of the user interface and/or computer programming arts. Moreover, any suitable communications media and technologies can be leveraged by the inventive embodiments.
It will thus be seen that the objects set forth above, among those made apparent from the preceding description, are efficiently attained, and since certain changes may be made in the above constructions and processes without departing from the spirit and scope of the invention, it is intended that all matter contained in the above description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Cluster Center h;
This application claims the benefit of U.S. Provisional Patent Application Nos. 61/675,095 filed on Jul. 24, 2012, and 61/783,971 filed on Mar. 14, 2013, the disclosures of which are hereby incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
61675095 | Jul 2012 | US | |
61783971 | Mar 2013 | US |