The present disclosure relates to machine learning systems. More particularly, the present disclosure relates to systems and methods for using machine learning to generate a model from audited data. Still more particularly, the present disclosure relates to applying the model generated from audited data to process new data for prediction and analysis.
One problem for complex processing systems is ensuring that they are operating within desired parameters. One prior art method for ensuring that complex processing systems are operating within desired parameters is to conduct a manual audit of the information used to make a decision and the decision made on that information. The problem with such an approach is that typically the audit is performed at a time well after the decision is made. Another problem is making use of this data retrieved from performing the audit to effectively improve how the complex processing system operates on new data. These are just some of the problems in using audit information to improve the operation of the complex processing systems.
The present disclosure overcomes the deficiencies of the prior art by providing a system and method for generating a model from audited data and systems and methods for using the model generated from the audited data to process new data. In one embodiment, the system of the present disclosure includes: a plurality of data sources, a training server having a machine learning unit, a prediction/scoring server having a machine learning predictor, and a data repository. The training server is coupled to receive and process information from the plurality of the resources. The training server processes the information received from the plurality of the resources and stores it in the data repository. The training server, in particular, the machine learning unit fuses the input data and ground truth data. The machine learning unit applies machine learning to the fused input data and ground truth data to create a model. The machine learning unit then provides the model to the prediction/scoring server for use in processing new data. The prediction/scoring server uses the model to process new data and provide or take actions prescribed by the model.
In general, another innovative aspect of the present disclosure described in this disclosure may be embodied in a method for generating a model from audited data comprising: receiving input data; receiving ground truth data; fusing the input data and the ground truth data to create fused data; applying machine learning to create a model from the fused data.
Other aspects include corresponding methods, systems, apparatus, and computer program products for these and other innovative aspects. These and other embodiments may each optionally include one or more of the following features.
For instance, the operations further include receiving unprocessed data, processing the unprocessed data with the model created from the fused data to identify an action, and one or more of providing the action and performing the action. For instance, the operations further include identifying a common identifier, fusing the input data and the ground truth data using the common identifier, and performing data preparation on the fused data. For instance, the features further include the input data relating to a complex processing workflow. For instance, the features further include the ground truth data being received from an auditor. For instance, the features further include the model including one or more of a classification model, a regression model, a ranking model, a semi-supervised model, a density estimation model, a clustering model, a dimensionality reduction model, a multidimensional querying model and an ensemble model. For instance, the features further include the action including one or more of a preventive action, generating a notification, generating qualitative insights, identifying a process from the input data for additional review, requesting more data, delaying the action, determining causation, and updating the model. For instance, the features include the ground truth data including one or more of validity data, qualification data, quantification data, correction data, preference data, likelihood data or similarity data.
The present disclosure is particularly advantageous because the model learned from the audited data may processes the new incoming data to identify whether there is a deviation from an expected norm and prescribes an interventional action that may prevent the deviation from happening. The model learned from the audited data may also process unaudited data to detect possible deviations from the norm and obtain an insight into the mechanisms responsible for the deviation.
The features and advantages described herein are not all-inclusive and many additional features and advantages should be apparent to one of ordinary skill in the art in view of the figures and description. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and not to limit the scope of the inventive subject matter.
The disclosure is illustrated by way of example, and not by way of limitation in the figures of the accompanying drawings in which like reference numerals are used to refer to similar elements.
A system and method for generating a model from audited data and systems and methods for using the model generated from audited data to process new data are described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. It should be apparent, however, that the disclosure may be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the disclosure. For example, the present disclosure is described in one embodiment below with reference to particular hardware and software embodiments. However, the present disclosure applies to other types of embodiments distributed in the cloud, over multiple machines, using multiple processors or cores, using virtual machines or integrated as a single machine.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment. In particular the present disclosure is described below in the context of multiple distinct architectures and some of the components are operable in multiple architectures while others are not.
Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers or memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
Aspects of the method and system described herein, such as the logic, may also be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (PLDs), such as field programmable gate arrays (FPGAs), programmable array logic (PAL) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits (ASICs). Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (MOSFET) technologies like complementary metal-oxide semiconductor (CMOS), bipolar technologies like emitter-coupled logic (ECL), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.
Finally, the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems should appear from the description below. In addition, the present disclosure is described without reference to any particular programming language. It should be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
The network 106 is a conventional type, wired or wireless, and may have any number of different configurations such as a star configuration, token ring configuration or other configurations known to those skilled in the art. Furthermore, the network 106 may comprise a local area network (LAN), a wide area network (WAN) (e.g., the Internet), and/or any other interconnected data path across which multiple devices may communicate. In yet another embodiment, the network 106 may be a peer-to-peer network. The network 106 may also be coupled to or include portions of a telecommunications network for sending data in a variety of different communication protocols. In some instances, the network 106 includes Bluetooth communication networks or a cellular communications network for sending and receiving data including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, wireless application protocol (WAP), email, etc.
The training server 102 is coupled to the network 106 for communication with other components of the system 100A, such as the workflow auditing system 136, the prediction/scoring server 108, and the data repository 112. In some embodiments, the training server 102 may be either a hardware server, a software server, or a combination of software and hardware. In the example of
The prediction/scoring server 108 is coupled to the network 106 for communication with other components of the system 100A, such as the workflow auditing system 136, the training server 102, and the data repository 112. In some embodiments, the prediction/scoring server 108 may be either a hardware server, a software server, or a combination of software and hardware. In the example of
Although only a single training server 102 is shown in
The data repository 112 is coupled to the training server 102 and the prediction/scoring server 108 via the network 106. The data repository 112 is a non-volatile memory device or similar permanent storage device and media. The data repository 112 stores data and instructions and comprises one or more devices such as a storage array, a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device known in the art. The data repository 112 stores information collected from the workflow auditing system 136. In one embodiment, the data repository 112 may also include a database for storing data, results, transaction histories and other information for the training server 102 and the prediction/scoring server 108.
The workflow auditing system 136 includes one or more data sources associated with a complex processing workflow that allow input of different types of data or information (automated and non-automated) related to a complex processing task to be provided or input to the training server 102 and/or the prediction/scoring server 108. It should be recognized that the workflow auditing system 136 and components thereof may vary based on the complex processing task that is audited. For clarity and convenience, the disclosure herein occasionally makes reference to examples where the complex processing workflow is insurance claim processing or credit card fraud identification. It should be noted that these are merely examples of complex processing workflows and other complex processing workflows exist and are within the scope of this disclosure. For example, it should be recognized that the disclosure herein may be adapted to complex processing workflows including, but not limited to, enforcement of licenses, royalties, and contracts in general, safety inspections, civil litigations, criminal investigations, college admissions, fraud detection, customer churn, new customer acquisition, preventive maintenance, and tax audits (both by the tax collection agencies for determination of a probability of a return being fraudulent or ranking of the returns according to how much they are underestimating the expected tax owed, and by the entities filing tax statements for estimation of the likelihood of being audited and the potential results of such audit).
In the example context of insurance claims and claim leakage, insurance claims are processed based upon a large amount of data. For example, the information used to determine the correct amount to pay on an insurance claim may include claimant information, profile data, expert witness data, witness data, medical data, investigator data, claims adjuster data, etc. This information is collected and processed and then the claim is paid. Sometime thereafter, an audit may be conducted of a small sampling of all the claims that were paid. As mentioned above, the workflow auditing system 136 and components thereof may vary based on the complex processing task that is audited. In the context of insurance claims and claim leakage, the workflow auditing system 136 may include a plurality of sources (e.g. a plurality of devices) for receiving or generating the above identified information used to determine the correct amount to pay on an insurance and the results of the audit conducted.
The plurality of data sources may also include an auditor device that provides an audit of a sample of information and a decision made on the sampled information in the complex processing workflow. The training server 102 processes the information received from the plurality of data sources associated with the workflow auditing system 136, fuses the input data and ground truth data, and applies machine learning to the fused input data and ground truth data to create a model. An example of the workflow auditing system 136 in the example context of insurance claims process based upon a large amount of data is described in more detail with reference to
In some embodiments, one or more of the data sources 120-134 may be a device of a type that may include a memory and a processor, for example a server, a personal computer, a laptop computer, a desktop computer, a tablet computer, a mobile telephone, a personal digital assistant (PDA), a mobile email device, a portable game player, a portable music player, a television with one or more processors embedded therein or coupled thereto or other electronic device capable of accessing the network 106. In some embodiments, one or more of the data sources 120-134 may be a sensor, for example, an image sensor, a pressure sensor, a humidity sensor, a gas sensor, an accelerometer, etc. capable of accessing the network 106 to provide a corresponding output. In some embodiments, one or more of the data sources 120-134 may include a browser for accessing online services. In the illustrated embodiment, one or more users may interact with the data sources 120-134. The data sources 120-134 are communicatively coupled to the network 106. The one or more users interacting with the data sources 120-134 may provide information in various formats as input data described below with reference to
Each of the data sources 120-134 included within the workflow auditing system 136 is capable of delivering information as described herein. While the system 100B shows only one device 120-134 of each type, it should be understood that the system 100B may include any number of devices 120-134 of each type to collect and provide information for storage in the data repository 112.
As indicated above, the workflow auditing system 136 and the components thereof may vary based on the complex process workflow. Similarly, the information those components (e.g. data sources) may provide varies and may include various information provided by a user of the system 100B, generated automatically by one or more of the components (e.g. data sources 120-134) of the system 100B or a combination thereof. In the example context of insurance claims processing, the workflow audit system 136 of system 100B includes the illustrated data sources 120-134 according to one embodiment. The applicant/claimant data device 120 may provide information from a user that initiated an application or claim. The witness/expert data device 122 may provide information from a user that may provide factual information, witness information, information as an expert such as a doctor or other technical subject matter. The evaluator/adjustor data device 124 may provide information from a user that provides an evaluation of an application or that is a claim adjustor. The investigator data device 126 may provide information from a user that is an investigator for an application or claim, for example to identify any missing information or anomalies in the application. The auditor device 128 may provide information from an auditor about a claim, either prior to the processing of the claim or after the processing of the claim (if the latter, this is label or ground truth data). The other information device 132 may provide information from a user of any other type of data used to evaluate or process the application or claim. The relationship device 134 may provide information about relationships of any person or entity associated with the application or claim. In some embodiments, the relationship device 134 may include one or more application interfaces to third party systems for social network information.
In some embodiments, the data sources 120-134 provide data (e.g. to the training server 102) automatically or responsive to being polled or queried. It should be noted that the data sources 124, 126, and 128 are shown within a dashed line 138 as they may be associated with a particular entity such as an insurance company, the Internal Revenue Service or college admissions office that undergoes and/or performs an audit. In some embodiments, the data sources 120-134 may process and derive the attributes for the type of data they provide. In other embodiments, the responsibility of processing and deriving the attributes is performed by the training server 102. Again, although several of data sources 120-134 are shown in
Referring again to
Referring again to
While the training server 102 and the prediction/scoring server 108 are shown as separate devices in
Referring now to
The input device 204 may include any device or mechanism for providing data and control signals to the training server 102 and may be coupled to the system directly or through intervening input/output controllers. For example, the input device 204 may include one or more of a keyboard, a mouse, a scanner, a joystick, a touchscreen, a webcam, a touchpad, a barcode reader, an eye gaze tracker, a sip-and-puff device, a voice-to-text interface, etc.
The communication unit 206 is coupled to signal lines 214 and the bus 220. The communication unit 206 links the processor 212 to the network 106 and other processing systems as represented by signal line 214. In some embodiments, the communication unit 206 provides other connections to the network 106 for distribution of files using standard network protocols such as transmission control protocol and the Internet protocol (TCP/IP), hypertext transfer protocol (HTTP), hypertext transfer protocol secure (HTTPS) and simple mail transfer protocol (SMTP) as should be understood to those skilled in the art. In some embodiments, the communication unit 206 is coupled to the network 106 or data repository 112 by a wireless connection and the communication unit 206 includes a transceiver for sending and receiving data. In such embodiments, the communication unit 206 includes a Wi-Fi transceiver for wireless communication with an access point. In some embodiments, the communication unit 206 includes a Bluetooth® transceiver for wireless communication with other devices. In some embodiments, the communication unit 206 includes a cellular communications transceiver for sending and receiving data over a cellular communications network such as via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, wireless application protocol (WAP), email, etc. In still another embodiment, the communication unit 206 includes ports for wired connectivity such as but not limited to USB, SD, or CAT-5, etc.
The output device 208 may include a display device, which may include light emitting diodes (LEDs). The display device represents any device equipped to display electronic images and data as described herein. The display device may be, for example, a cathode ray tube (CRT), liquid crystal display (LCD), projector, or any other similarly equipped display device, screen, or monitor. In one embodiment, the display device is equipped with a touch screen in which a touch sensitive, transparent panel is aligned with the screen of the display device. The output device 208 indicates the status of the training server 102 such as: 1) whether it has power and is operational; 2) whether it has network connectivity; 3) whether it is processing transactions. Those skilled in the art should recognize that there may be a variety of additional status indicators beyond those listed above that may be part of the output device 208. The output device 208 may include speakers in some embodiments.
The memory 210 stores instructions and/or data that may be executed by processor 212. The instructions and/or data may comprise code for performing any and/or all of the techniques described herein. The memory 210 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory or some other memory device known in the art. In one embodiment, the memory 210 also includes a non-volatile memory such as a hard disk drive or flash drive for storing information on a more permanent basis. The memory 210 is coupled by the bus 220 for communication with the other components of the training server 102.
The processor 212 comprises an arithmetic logic unit, a microprocessor, a general purpose controller or some other processor array to perform computations, provide electronic display signals to output device 208, and perform the processing of the present disclosure. The processor 212 is coupled to the bus 220 for communication with the other components of the training server 102. Processor 212 processes data signals and may comprise various computing architectures including a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. Although only a single processor 212 is shown in
The bus 220 represents a shared bus for communicating information and data throughout the training server 102. The bus 220 may represent one or more buses including an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, a universal serial bus (USB), or some other bus known in the art to provide similar functionality. Components coupled to processor 212 by system bus 220 include the input device 204, the communication unit 206, the output device 208, the memory 210, and the machine learning unit 104.
In one embodiment, the machine learning unit 104 includes one or more machine learning models 250, a data collection module 252, a feature extraction module 254, a data fusion module 256, an action module 258, a model creation module 260, an active learning module 262 and a reinforcement learning module 264.
The one or more machine learning models 250 may include one or more example models that may be used by the model creation module 260 to create a model, which is provided to the prediction/scoring server 108. The machine learning models 250 may also include different models that may be trained and modified using the ground truth data received from the auditor device included in the workflow auditing system 136. Depending on the embodiment, the one or more machine learning models 250 may include supervised machine learning models only, unsupervised machine learning models only or both supervised and unsupervised machine learning models. The machine learning models 250 are accessible and provided to the model creation module 260 for creation of a model in accordance with the method of
Referring now to
The classification model 302 is a model that may identify one or more classifications to which new input data belongs. The classification model 302 is created by using the fused data to train the model, and allowing the model based on labels from the audited data to determine parameters that are determinative of the label value. For example, the auditing of insurance claims and determining each claim as having either a label of legitimate or illegitimate may be used by the model creation module 260 to build a classification model 302 that determines the legitimacy of claims for exclusions such as fraud, jurisdiction, regulation or contract. In another example, the auditing of credit card purchases and disputes and determining each claim as having either a label of authorized or unauthorized may be used by the model creation module 260 to build a classification model 302 that determines the valid use of the credit cards during purchases for exclusions such as credit card fraud.
The regression model 304 is a model that may determine a value or value range. By training the regression model 304 on the fused data, the regression model 304 may estimate relationships among variables or parameters. For example, the regression model 304 may be used in insurance claims processing to determine a true amount that should have been paid, a range that should have been used, or some proxy or derivative thereof. In some embodiments, the model creation module 260 creates a regression model 304 that outputs the difference between what was determined to be paid during the audit and what should have been paid.
The ranking model 306 is a model that may determine a ranking or ordering based on true value or a probability of having a value for a parameter. The ranking model 306 may provide a ranked list of applications or claims from the greatest to the least difference from a true value. The order is typically induced by forcing an ordinal score or a binary judgment. The ranking model 306 may be trained, by the model creation module 260, with a partially ordered list including the input data and the label data. The ranking model 306 is advantageous because it may include more qualitative opinions and may be used to represent multiple objectives.
The semi-supervised model 308 is a model that uses training data that includes both labeled and unlabeled data. Typically, the semi-supervised model 308 uses a small amount of labeled data with a large amount of unlabeled data. For example, the semi-supervised model 308 is particularly applicable for use on insurance claims or tax filings, where only a small percentage of all claims or tax filings are audited and thus have label data. More specifically, the claims may be labeled with a legitimate value or an illegitimate value for the labeled data and a null value for the unlabeled data in one embodiment. Tax filings may be labeled with an over-paid, under-paid, or paid for the labeled data and null value for unlabeled data in one embodiment. The semi-supervised model 308 attempts to infer the correct labels for the unlabeled data.
The density estimation model 310 is a model that selects labeled rows of a particular value for that single label and uses only those rows to train the model. Then the density estimation model 310 may be used to score new data to determine if the new data should have the same value as the label. For example, in the insurance claim context, the density estimation model 310 may, in some embodiments, be trained, by the model creation module 260, only with rows of data that have the label legitimate in the audit column, or trained, by the model creation module 260, only with rows of data that have the label illegitimate in the audit column. Once the model has been trained by the model creation module 260, it may be used (e.g. at the prediction/scoring server 108) to score new data, and the rows may be determined to be labeled legitimate or illegitimate based on the underlying probability density function.
The clustering model 312 is a model that groups sets of objects in a manner that objects in the same group or cluster are more similar to each other than to other objects in other groups, which are occasionally referred to as clusters. For example, insurance claims or applications may be clustered based on parameters of the claims. The clustering model 312 created, by the model creation module 260, may assign a label to each cluster based on the claims in that cluster being labeled as legitimate or illegitimate. New claims may then be scored (e.g. at the prediction/scoring server 108) by assigning the claim to a cluster and determining the label assigned to that cluster.
It should be recognized that the use of ground truth from audited data with an unsupervised machine learning model is not incompatible and may allow for interesting use cases. For example, let us consider clustering, which is commonly considered an unsupervised machine learning model. When the ground truth is used to identify a “correct” clustering, this is classification (i.e. supervised). When the ground truth data is used to indicate one or more of certain members (e.g. claims) that should be in the same cluster, how many clusters should exist (e.g. overpaid, underpaid and correctly paid), where the center of a cluster should be, etc., this is semi-supervised. However, unsupervised clustering may be used, in some embodiments, to identify one or more clusters of applicants that are consistently flagged (according to ground truth) and identify the one or more properties associated with each of the one or more clusters. The ground truth data may also be used to validate an unsupervised model created by the model creation module 260.
The dimensionality reduction model 314 is a model that reduces the number of variables under consideration using one or more of feature selection and feature extraction. Examples of feature selection may include filtering (e.g. using information gain), wrapping (e.g. search guided by accuracy) embedding (variables are added or removed as the model creation module 260 creates the model based on prediction errors), etc. For example, in the credit card fraud context, the dimensionality reduction model 314 may be used by the model creation module 260 to generate model that identifies a transaction as fraudulent or non-fraudulent based on a subset of the received input data.
The multidimensional querying model 316 is a model that finds the closest or most similar points. An example of a multidimensional querying model 316 is nearest neighbors; however, it should be recognized that other multidimensional querying models exist, and their use is contemplated and within the scope of this disclosure. For example, in the credit card fraud context, a transaction may be identified as fraudulent or non-fraudulent based on the label(s) of its nearest neighbors.
The ensemble model 318 is a model that uses multiple constituent machine learning algorithms. For example, in one embodiment, the ensemble model 318 may be boosting and, in the context of insurance claims, the ensemble model 318 is used by the model creation module 260 to incrementally build a model by training each new model instance to emphasize training instances (e.g. claims) miss-classified by the previous instance(s). It should be recognized that boosting is merely one example of an ensemble model and other ensemble models exist and their use is contemplated and within the scope of this disclosure.
The data collection module 252 may include software and routines for collecting data from the workflow auditing system 136. For example, the data collection module 252 receives or retrieves data from the plurality of data sources 120-134 included in the workflow auditing system 136 as shown in the example of
The feature extraction module 254 may include software and routines for performing feature extraction on the data collected and stored by the data collection module 252 in the data repository 112. The feature extraction module 254 may perform one or more feature extraction techniques. In some embodiments, the feature extraction module 254 may be a set of instructions executable by the processor 212 to provide the functionality for performing feature extraction on the data collected by the data collection module 252. In some other embodiments, the feature extraction module 254 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The feature extraction module 254 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
The data fusion module 256 may include software and routines for performing data fusion between the ground truth data and the other input data collected by the data collection module 252. The data fusion module 256 may perform a join or other combination of the features extracted from the ground truth data and the input data by the feature extraction module 254. In one embodiment, the data fusion module 256 identifies a common identifier, i.e. an identifier in both the ground truth data and the input data, and uses the common identifier to fuse ground truth data and input data. For example, in one embodiment, the data fusion module 256 automatically (i.e. without user intervention) an identifier (e.g. an insurance claim number) common to ground truth data (e.g. audit data) and input data and fuses the input data and ground truth data using the common identifier. For purposes of this application, the terms “label” and “ground truth data” are used interchangeably to mean the same thing, namely, a ground truth value determined from the performance of an audit, for example, of a process. In some embodiments, the data fusion module 256 performs data preprocessing, occasionally referred to as data preparation, on the fused data or inputs thereof (e.g. ground truth data or input data). For example, data preprocessing may include data cleaning, removal of outliers, identifying and treating missing values, and transformation of values, etc. In a particular example case of text data, this may include bag-of-words transformation, stemming, stop word removal, topic modeling, etc. In some embodiments, the data fusion module 256 may be a set of instructions executable by the processor 212 to provide the functionality for performing data fusion. In some other embodiments, the data fusion module 256 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The data fusion module 256 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
The action module 258 may include software and routines for determining and prescribing an action that should be performed based on the prediction of the model and any applied constraints. In some embodiments, the action module 258 may be a set of instructions executable by the processor 212 to provide the functionality for prescribing an action that should be performed based on the prediction of the model. In some other embodiments, the action module 258 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The action module 258 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
The model creation module 260 may include software and routines for creating a model (to send to the prediction/scoring server 108) by applying machine learning to the fused data received from the data fusion module 256. In some embodiments, the model creation module 260 may be a set of instructions executable by the processor 212 to provide the functionality for applying machine learning. In some other embodiments, the model creation module 260 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The model creation module 260 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
As should be recognized by the discussion above with regard to machine learning models 302-312, the type of model chosen and used by the model creation module 260 depends on the specific task and the data (including fused data) available. For example, if the goal is to determine the amount of leakage on any particular claim in an insurance claims processing workflow, then a regression model 304 is trained by the model creation module 260 from the previously audited claims and the amounts of leakage found in these claims upon review. Leakage refers to a difference between what was paid and what should have been paid (often when what was paid exceeds what should have been paid). Once the model has been created, it may be used to process new or additional data (e.g. unprocessed and/or new insurance claims). In another example, if the goal is to prioritize which among a group of tax documents/returns should be selected for a review in a tax return processing workflow, then a ranking model 306 may be trained by the model creation module 260 on the set of previously available tax documents with the previous auditors' choices of which of these documents to review (i.e. fused data), and the results of the reviews used as labels. The model creation module 260 selects one of the machine learning models 250 for use by the predictive/scoring server 208. It should be noted that the models generated by the model creation module 260 are notably distinct as they incorporate information from the ground truth data. Within each model, the system 100A may incorporate competing labels. For example, labels that have been provided by multiple experts or auditors (which may or may not be in agreement).
The active learning module 262 may include software and routines for performing active learning. For example, active learning may include identifying particular data or rows that have particular attributes that may be used to improve the model generated by the model creation module 260, determine which features are more important to model accuracy, identify missing information corresponding to those attributes and try to secure additional information to improve the performance of the model generated by the model creation module 260. For example, the active learning module 262 may cooperate with the data sources 120-134 in the workflow auditing system 136 to secure the additional information (e.g. from one or more users) under the constraints of what is permissible under the applicable laws. In some embodiments, the active learning module 262 may be a set of instructions executable by the processor 212 to provide the functionality for performing active learning. In some other embodiments, the active learning module 262 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The active learning module 262 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
The reinforcement learning module 264 may include software and routines for performing reinforcement learning where the model generated accounts for the future consequences of taking a particular action and try to identify an optimal action. The reinforcement learning module 264 may identify particular changes based on the predicted action or look for tipping points at which the recommended action has different or greater consequences. In some embodiments, the reinforcement learning module 264 may be a set of instructions executable by the processor 212 to provide reinforcement learning. In some other embodiments, the reinforcement learning module 264 may be stored in the memory 210 of the training server 102 and may be accessible and executable by the processor 212. The reinforcement learning module 264 may be adapted for cooperation and communication with the processor 212 and other components of the training server 102 via the bus 220.
Referring now to
Those skilled in the art should recognize that some of the components of the prediction/scoring server 108 have the same or similar functionality as some of the components of the training server 102 so descriptions of these components is not be repeated here. For example, the input device 416, the communication unit 418, the output device 420, the memory 422, the processor 424, and the bus 426 are similar to those described above.
In one embodiment, the machine learning predictor 110 includes a machine learning model 402, a data collection module 404, a feature extraction module 406, an action module 408, a model updating module 410, an active learning module 412 and a reinforcement learning module 414. The machine learning predictor 110 has a number of applications. First, the machine learning predictor 110 may be used to analyze new data, occasionally referred to as unprocessed data, for the purpose of identifying a mistake or error before it occurs and preventing it. For example, the machine learning predictor 110 may be applied to new data such as a recent insurance claim being processed in an insurance claims processing workflow to predict whether that claim is headed toward leakage. If so, the leakage may then possibly be prevented via interventional action performed by the action module 408. Second, the machine learning predictor 110 may be used to go over new data such as past, unanalyzed data retrieved from the workflow auditing system 136 to identify issues. For example, again in the insurance claim context, the model may be used to go back over past unaudited insurance claims to detect possible leakages. This may be used this to obtain deeper insights into the mechanisms responsible for leakage, or even to re-open claims in some cases.
The machine learning model 402 is the mathematical model generated by the machine learning unit 104 that may be used to make predictions and decisions on new data. In some embodiments, the machine learning model 402 may include ensemble methods, model selection, parameter selection and cross validation. It should be understood that the machine learning model 402 is particularly advantageous because the model may operate on partial and incomplete data sets. The machine learning model 402 cooperates with the feature extraction module 406 and the action module 408 to predict an appropriate action based on the features provided by the feature extraction module 406. The machine learning model 402 may be adapted for cooperation and communication with the processor 424 and other components of the prediction/scoring server 108 via the bus 426.
The data collection module 404 may include software and routines for collecting a new set of data from the workflow auditing system 136 for analysis. The data collection module 404 is similar to the data collection module 252 in
The feature extraction module 406 may include software and routines for performing feature extraction on the new set of data collected by the data collection module 404. The feature extraction module 406 is similar to the feature extraction module 254 in
The action module 408 may include software and routines for performing the action specified by the prediction of the machine learning model 402. In some embodiments, the action module 408 may be a set of instructions executable by the processor 424 to provide the functionality described herein for performing the action specified by the prediction of the machine learning model 402. In some other embodiments, the action module 408 may be stored in the memory 422 of the prediction/scoring server 108 and may be accessible and executable by the processor 424. The action module 408 may be adapted for cooperation and communication with the processor 424 and other components of the prediction/scoring server 108 via the bus 426.
The model updating module 410 may include software and routines for updating the machine learning model 402 based on the new information retrieved and processed by the machine learning predictor 110. In some embodiments, the training server 102 and the prediction/scoring server 108 are the same server for optimum operation of the model updating module 410. Moreover in some embodiments, the model updating module 410 is operating continuously so online learning is performed and the machine learning model 402 is continually being updated. In some other embodiments, the model updating module 410 may be stored in the memory 422 of the prediction/scoring server 108 and may be accessible and executable by the processor 424. The model updating module 410 may be adapted for cooperation and communication with the processor 424 and other components of the prediction/scoring server 108 via the bus 426.
The active learning module 412 may include software and routines for performing active learning. For example, active learning may include identifying particular data or rows that have particular attributes that may be used to improve the machine learning model 402, determine which features are more important to model accuracy, identify missing information corresponding to those attributes and try to secure additional information to improve the performance of the machine learning model 402. For example, the active learning module 412 may cooperate with the data sources 120-134 in the workflow auditing system 136 to secure the additional information (e.g. from one or more users) under the constraints of what is permissible under the applicable laws. In some embodiments, the active learning module 412 may be a set of instructions executable by the processor 424 to provide the functionality for performing active learning. In some other embodiments, the active learning module 412 may be stored in the memory 422 of the prediction/scoring server 108 and may be accessible and executable by the processor 424. The active learning module 412 may be adapted for cooperation and communication with the processor 424 and other components of the prediction/scoring server 108 via the bus 426.
The reinforcement learning module 414 may include software and routines for performing reinforcement learning where the machine learning model 402 accounts for the future consequences of taking a particular action and tries to identify an optimal action. The reinforcement learning module 414 may identify particular changes based on the predicted action or look for tipping points at which the recommended action has different or greater consequences. In some embodiments, the reinforcement learning module 414 may be a set of instructions executable by the processor 424 to provide reinforcement learning. In some other embodiments, the reinforcement learning module 414 may be stored in the memory 422 of the prediction/scoring server 108 and may be accessible and executable by the processor 424. The reinforcement learning module 414 may be adapted for cooperation and communication with the processor 424 and other components of the prediction/scoring server 108 via the bus 426.
At block 504, data collection module 252 receives labels or ground truth data. The label may be provided as manual input of an auditor (human being) evaluating a process or result thereof. Alternatively, the label may be provided or derived from an automated auditing procedure (also an auditor) that is applied to a process or result thereof. Examples of labels are described in more detail below with reference to
As illustrated in
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
It should be understood that the model or the action module 408 may also specify, at block 516, a role assigned to each action. For example, in the insurance claim context, one or more of the actions may be taken or caused to be taken by the adjuster, the investigator, the auditor or other person associated with the insuring company. In one embodiment, the model is applied to the real time processing of data, for example, insurance claims as they are made to take the appropriate action as determined by the model. In another embodiment, the claims that have already been processed are scored with the model to determine the appropriate action. That appropriate action is then compared to the action actually taken on a claim and the discrepancies are examined.
The foregoing description of the embodiments of the present disclosure has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the present disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the present disclosure be limited not by this detailed description, but rather by the claims of this application. As should be understood by those familiar with the art, the present disclosure may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, routines, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the present disclosure or its features may have different names, divisions and/or formats. Furthermore, as should be apparent to one of ordinary skill in the relevant art, the modules, routines, features, attributes, methodologies and other aspects of the present disclosure may be implemented as software, hardware, firmware or any combination of the three. Also, wherever a component, an example of which is a module, of the present disclosure is implemented as software, the component may be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of ordinary skill in the art of computer programming. Additionally, the present disclosure is in no way limited to implementation in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure of the present disclosure is intended to be illustrative, but not limiting, of the scope of the present disclosure, which is set forth in the following claims.
The present application claims priority, under 35 U.S.C. §119, of U.S. Provisional Patent Application No. 62/130,501, filed Mar. 9, 2015 and entitled “System and Method for Using Machine Learning to Generate a Model from Audited Data,” which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62130501 | Mar 2015 | US |