This document relates generally to computer predictive models and more particularly to enterprise data handling and scoring.
Computer predictive models have been used for many years in a diverse number of areas, such as in the financial industry. For example, the computer predictive models provide an automated or semi-automated mechanism for determining whether suspicious activity, such as credit card fraud, may have occurred. However, current systems may be limited in the data that they are provided for performing such analyses.
In accordance with the teachings provided herein, systems and methods are provided for storing transaction data associated with transactions of disparate types. Transaction data is received describing a transaction that has occurred, the transaction being performed by an customer of a particular customer type, the transaction being of a particular activity type, and the transaction being performed using a channel of a particular channel type. Transaction data about the customer is stored in a customer segment according to one of a plurality of customer templates, the one of the plurality of customer templates being selected according to the customer type. Transaction data about the activity is stored in an activity segment according to one of a plurality of activity templates, the one of the plurality of activity templates being selected according to the activity type. Transaction data about the channel is stored in a channel segment according to one of a plurality of channel templates, the one of the plurality of channel templates being selected according to the channel type. Data from the customer segment, the activity segment, and the channel segment for the transaction is extracted and scored by a predictive model.
As another example, a system for storing transaction data associated with transactions of disparate types is provided. The system may include one or more data processors and a computer-readable medium encoded with instructions for commanding the one or more data processors to execute steps. In the steps, transaction data is received describing a transaction that has occurred, the transaction being performed by an customer of a particular customer type, the transaction being of a particular activity type, and the transaction being performed using a channel of a particular channel type. Transaction data about the customer is stored in a customer segment according to one of a plurality of customer templates, the one of the plurality of customer templates being selected according to the customer type. Transaction data about the activity is stored in an activity segment according to one of a plurality of activity templates, the one of the plurality of activity templates being selected according to the activity type. Transaction data about the channel is stored in a channel segment according to one of a plurality of channel templates, the one of the plurality of channel templates being selected according to the channel type. Data from the customer segment, the activity segment, and the channel segment for the transaction is extracted and scored by a predictive model.
As a further example, a computer-readable medium may be configured to store transaction data associated with transactions of disparate types, the transaction data describing a transaction that has occurred, the transaction being performed by a customer of a particular customer type, the transaction being of a particular activity type, and the transaction being performed using a channel of a particular channel type. The computer-readable medium may include a customer segment on the computer-readable medium formatted according to one of a plurality of customer templates, the one of the plurality of customer templates being selected for the customer segment according to the customer type. The computer-readable medium may further include an activity segment on the computer-readable medium formatted according to one of a plurality of activity templates, the one of the plurality of activity templates being selected according to the activity type, and a channel segment on the computer-readable medium formatted according to one of a plurality of channel templates, the one of the plurality of channel templates being selected according to the channel type. The computer-readable medium may be configured to provide data associated with the transaction from the customer segment, the activity segment, and the channel segment to a predictive model for scoring.
For example, an enterprise data management system 102 may be utilized in a banking system. Banks have traditionally gathered and analyzed large amounts of data related to customer actions. For example, a bank may track a history of a particular customer's credit card usage. Based on the tracked history, the bank may identify certain parameters that are consistent with the particular customer's credit card usage. For example, the bank may identify a transaction volume parameter, a transaction location parameter, a transaction amount parameter, an on-time balance payment parameter, and a balance amount parameter. These identified parameters may then be used by a predictive model to help score various aspects of a new credit card transaction received for the particular customer.
For example, the transaction volume, transaction location, and transaction amount parameters may be used by a predictive model to identify the likelihood that one or more newly received credit card transactions from a customer are fraudulent. A new transaction may be identified as likely fraudulent for a number of reasons. For example, if a higher than normal volume of transactions for the particular customer are received from a remote country, far from the customer's regular transaction locations, one or more transactions may be identified as likely fraudulent and flagged for further investigation.
As another example, the transaction volume, transaction amount, on-time balance payment parameter, and balance amount parameter may be used to score a new transaction to determine whether the new transaction should be authorized from a credit risk perspective. For example, if a newly received transaction is for a higher than usual transaction amount, and the particular customer has a higher than usual balance amount, then the bank may wish to decline to transaction for being too risky. However, if the particular customer has a high on-time balance payment parameter, a higher amount of trust may be afforded to the particular customer by the bank, and the credit card transaction may be approved despite being out of the bounds of certain parameters for the particular customer.
Scoring a set of transactions of a particular type may be highly useful for analyzing certain aspects of those transactions. However, an additional level of synergy may be achieved by analyzing a number of transactions of disparate types. For example, fraud being perpetrated against a particular customer who has had his identity stolen may be better identified by looking at all transactions associated with the particular customer rather than transactions of single types in isolation. A fraudster may make small or otherwise difficult to distinguish transactions purporting to be the particular customer using various channels. For example, the fraudster may purchase gasoline using a falsified credit card, pay a dinner bill using a false debit card, and purchase concert tickets using a bad check using the particular customer's account. None of these transactions on their own may be of a sufficient suspicious nature to be flagged as likely fraudulent. Additionally, because the transactions are all of disparate types (i.e., credit card, debit card, check), no aggregation within one of the transaction types will aid a predictive model in identifying the fraud. However, if all three of the transactions of disparate types are analyzed together, a predictive model may be able to identify that these transactions are likely fraudulent (e.g., the credit card gasoline purchase by the fraudster was made at a city 25 miles away from a location of a legitimate point of sale debit card purchase by the particular customer with the transactions occurring 3 minutes apart).
Traditional systems are unable to offer such analyses because those systems do not use and store data from disparate transaction types together. For example, a bank will typically have a credit card data store, a debit card data store, a checking account data store, a mortgage account data store, as well as others. Each of these types of transactions is typically analyzed for fraud completely separately. No data is typically shared between systems serving various account types within a bank. Predictive models may be run to analyze data stored in those individual data stores in isolation. One reason for this isolated approach is the data disparity among transactions of disparate types. For example, a record of a credit card purchase may track a credit card number and an expiration date of the credit card used to make the purchase, while a check purchase record may track a routing number and account number associated with a check used to make the purchase. Because it would be highly impractical to store data records containing all fields needed for tracking all possible transaction types, the isolated transaction type data stores have been utilized.
While the above example describes transactions that are of different type based on differing transaction account types (e.g., credit card, debit card, check), transactions can be of disparate types based on variations in other aspects as well. For example, transactions may be of disparate types based on customer type (e.g., business, individual), channels (e.g., point-of-sale, online, mobile phone), authentication type, or activity (e.g., purchase, bill pay, funds transfer).
With reference back to
At 406, segment template selection is performed for the received transaction data 404. The enterprise data management system 402 maintains one or more segment templates 408 for each of a plurality of different segments. For example, for a customer segment, the enterprise data management system 402 may maintain an individual customer template and a business customer template. As another example, for a channel segment, the enterprise data management system 402 may maintain a point of sale template, an online template, and a mobile device template. Based on the attributes associated with a transaction (e.g., customer type, activity type, channel type), a segment template is selected for each segment (e.g., a customer template, an account template, a channel template, an authentication template, an activity template).
Once templates are selected at 406, the segments are generated and populated according to the selected templates at 410 with data from the received transaction 404. The populated segments 412 are provided for storage in one or more data stores 414. The enterprise data contained in the one or more data stores 414 may then be aggregated, scored, or otherwise analyzed at 416 to provide an enterprise data score 418. For example, data associated with the transaction from the customer segment, the activity segment, and the channel segment may be provided to a predictive model for scoring.
In another embodiment, transaction data 404 may be received from a data provider in an already template formatted segment state. A client may perform segment template selection 406 and segment generation and population 410 based on details of a transaction for which the client wishes to provide data, or the client may use a consistent set of templates for a particular type of transaction or transactions that the client performs. In such a case, the enterprise data management system 402 may receive the formatted data and store the formatted data in the data store(s) 414 with little or no processing. Such a configuration is shown and described in detail in
The modular design illustrated in
The advantage of the enterprise data management system over traditional systems may be illustrated by analogy. Traditional symbols are like a logographic writing system (e.g., Chinese) in which symbols stand for individual words. A unique layout must be created for each of the transaction type that one wishes to track. The enterprise data management system is like an alphabet writing system where the segments and templates provide much more manageable alphabet that can be used to describe the many possible disparate transaction types.
For example, in a banking example, a number of channel templates may initially be included in a set of provided segment templates. If future technology allowed purchases and other transactions to be accomplished using a microchip implanted in a consumer's hand, the flexible design of the enterprise data management system enables tracking of such transactions without substantial changes to data storage schemas. By adding a new channel template that tracks certain data related to an implanted microchip channel (e.g., microchip identifier, microchip expiration date), transaction data related to an implanted microchip transaction can be stored along with transaction data associated with currently known channel types. This added functionality can be implemented without requiring any changes to customer, activity, or other templates.
Templates may also be edited as necessary or desired. For example, a bank may decide that they wish to track a driver's license number in addition to a social security number for individual customers. The individual customer template can be edited to add a driver's license number field to accomplish this transition. No edits need to be made to other templates to accomplish this change.
When transaction data is received, one or more templates are selected for each of the cars 902 of the cargo train based on attributes of the particular transaction associated with the transaction data. Thus, a customer template is selected for the customer segment car 904, and an account template is selected for the account segment car 908, with one or more templates being selected and populated for each of a plurality of the cars. Different templates may include different numbers and formats of fields used for storage. For example, an individual customer template may have 3 integer fields and 5 text string fields, while a company customer template may have 5 integer fields, 1 long integer field, 4 text string fields, 2 Boolean fields, and 1 real number field. Segments cars are filled using transaction data according to selected templates and are provided for storage according to those templates.
Different cargo trains may have different numbers of cars depending on parameters of a desired application. Cargo trains may have more or fewer cars for containing data stored according to templates than depicted. Segments may have one or more templates from which to select. Some segments may always be stored according to the same one or more templates. For example, a common data segment may contain general (e.g., header) information about a particular transaction. A common data segment may contain information about which templates are used to store data for other segments associated with a particular segment. A common segment may also contain data associated with a date and time associated with the transaction. Because these types of common data are constant across many/all transaction types, all common segments may be stored according to a limited number of common segment templates (e.g., 1 or 2).
Two checking account templates are selected for the account segment 1004 (i.e., the AQO and AQD templates). The account segment is populated according to those templates and contains data such as an account number, an account type, available balances, and account level daily limits. Two online banking channel templates are selected for the channel segment 1006 (i.e., the HQO and HOB templates). The channel segment is populated according to those templates and contains data such as an online banking login identifier, a session identifier, an IP address, and a web page that originated the transaction. A non-card related authentication template is selected for the authentication segment 1008. Because the transaction is an online banking transaction, the authentication is performed via logging onto a password protected online banking customer account and related security checks. The UNM authentication segment template is used to store the authentication method used and results. Two schedule bill payment/EFT templates are selected for the activity segment 1010 (i.e., the TSH and TPP templates). The activity segment is populated according to those templates and contains data such as payment amount, frequency, schedule start and end dates, bill payment reference number, payee identifier, payee name, payee type, payee location. In the present example, no template is selected and no data is stored for the activity details segment 1012. This segment may be used for storing other data related to the transaction activity, such as certain data that a specific client wishes to track.
After templates have been selected for each of the segments, a common data segment 1014 is populated using the SMH and RQO templates. In this example, the common data segment is populated with data related to the particular transaction such as the customer type, account type, authentication type, channel type, activity type, and a date and time associated with the transaction. The populated data segments may be combined, as shown at 1016 for transport to certain portions of the enterprise data management system, such as one or more data stores for storage.
In the example of
The example of
Another example depicts a monthly account statement being sent to a business customer. The bank systems 1106 apply the API schema 1104 to describe the transaction in segments according to the selected templates 1108. The bank systems 1106 generate a common segment, a customer segment according to template X2 for a business customer, an account segment according to template A2 for a business account, a channel segment according to template H4 for the mail channel, and an activity segment according to template T5 for a monthly statement mailing activity.
The transactions are transmitted to a Universal SAS Connector (USC) 1110. The USC retrieves signatures from the signature database according to specific key fields (e.g., account number, card number, terminal identifier). A transaction and signature are forwarded to an on demand scoring engine (OSE) 1114. The OSE parses the transaction and splits the segments into individual fields. The OSE then commands the execution of scoring code. The scoring code commands execution of the appropriate scoring models 1116 to score the transaction. Multiple scores may be generated for a given transaction. The generated scores are returned to the OSE 1114. The OSE 1114 may process client defined rules and transmit results to the USC 1110. The USC may transmit updated signatures to the signature database 1112, and a copy of the transaction may be forwarded to a reporting history database 1118.
For example, a model 1208 may accept certain inputs used to provide an enterprise data score 1210 (e.g., a fraud likelihood score). A metadata data extraction map may be used to direct extraction of data from the data store 1202 that is stored in segments according to different templates so that values for different model inputs can utilized by the model 1208 to generate one or more enterprise data scores 1210. The extracted data may then be aggregated at 1206 according to the problem being analyzed. Aggregation may be according to a number of parameters. For example, a number of raw data values to be used for different inputs to the model 1208 may be specified (e.g., aggregate and average the last 100 transaction values for a particular point of sale location).
To analyze whether certain point of sale locations have a high association with fraud, the extracted data may be aggregated at the channel level according to point of sale locations. The aggregated data 1212 may be scored at 1214 by one or more models 1208 to generate one or more enterprise data scores 1210. For example, an enterprise data score 1210 may be generated for each of a plurality of point of sale locations identifying a level of fraud associated with those locations.
Providing scoring of aggregated data offers a number of advantages. For example, scoring aggregated data enables identification of patterns not discernable on examination of non-aggregated data, such as single credit card-holder data. For a single account holder who has been defrauded (e.g., credit card or identity stolen), it may be difficult to pinpoint an exact transaction from which the fraud initiated. However, analysis of aggregated data can bring certain patterns to light that may be otherwise unseen. For example, aggregating data at the channel level by point of sale locations may identify that a certain point of sale location is common among a number of defrauded customers (e.g., an unscrupulous shop owner steals a credit card number of one individual and one company customer, checking account and routing numbers for one company customer, and debit card numbers for two individual customers). Looking at each of these customers in isolation may not identify the origin of the fraud. Additionally, looking at data aggregated at the point of sale channel level for each of the customer/account types may not identify the origin of the fraud because of the limited number of fraudulent transactions at the point of sale location using each customer/account type (i.e., one individual credit card account, one company credit card account, one company checking account, and two individual debit card accounts). However, analyzing data aggregated at the channel level by point of sale location enables identification of this example fraud by compiling enough data related to that point of sale (e.g., all five of the above described fraudulent transactions) to identify the pattern.
Such aggregation and analysis is not possible using traditional systems. Traditional systems do not offer the flexibility to store transaction data associated with a large number of disparate transaction types (e.g., in a single data store). Thus, each of the different transaction types must be analyzed in isolation. If one would desire to analyze across disparate transaction types, significant work would be required to extract the data for disparate transaction types from different data stores that are stored according to differing schemas. The enterprise data management system provides centralized storage of transaction data associated with potentially thousands of different transaction types. Such centralized storage with capacity to extract model input fields and aggregate them at different levels (e.g., segments according to segment attributes), enables much more powerful analysis capability over systems storing transaction data for individual transaction types in isolation.
At 1314, data from the stored segments may be aggregated and scored to generate an enterprise data score 1316. For example, a particular problem may be analyzed that utilizes data aggregated at a certain segment level (e.g., the authorization segment level by authorization type). The aggregate data score 1316 generated by a model may be output. Additionally, the aggregate data score 1316 and/or other data from the data store 1312 may be used to train a real-time predictive scoring model 1318. For example, a real-time data scoring operation 1318 may seek to generate a likelihood of fraud score for received transaction data 1302 associated with a particular transaction. Previously, a data aggregation and scoring operation 1314, has utilized data aggregated at the authorization segment level by authorization type to discover that several transactions using the public key infrastructure (PKI) for authentication are associated with fraud. This discovery is manifest in an aggregated data score 1316 for the PKI authentication type. The real-time transaction data scoring model may be trained using the aggregated data score 1316 for the PKI authentication type. Thus, received transaction data 1302 for transactions using PKI authentication may be assigned a higher likelihood of fraud real-time transaction data score 1320 based on the discovery in data aggregation and scoring 1314.
Data from the one or more data stores 1414 may be extracted, aggregated, and scored at 1416. For example, extracted data may be aggregated according to one or more attributes. In one example, an identification of the likelihood that transactions at different points of sale are fraudulent is sought. Data may be extracted from stored segments from the one or more data stores 1414 and aggregated at the channel level according to point of sale locations. The aggregated data may be scored by one or more models to generate an enterprise data score 1418. The enterprise data score 1418 may have value in itself and be outputted. For example, a listing of point of sale locations associated with fraud may be used as a list of locations to investigate for purposes of fraud recovery.
The enterprise data score 1418 may also have value in identifying future fraud. Thus, the enterprise data score 1418 may be provided for training a predictive model for evaluating real-time transaction data at 1420. The use of the enterprise data score 1418 may enable the real-time transaction data scoring model to act on patterns that may be unrecognizable using data directly from the data store 1414 alone. One or more real-time transaction data scoring models receive transaction data 1404 at 1420 to provide one or more real-time transaction data scores 1422. For example, the real-time transaction data score may be a real-number value between 0 and 1 identifying a likelihood of the new transaction being scored is associated with fraud. Transactions associated with point of sale locations associated with fraud by the data aggregation and scoring 1416 may receive higher real-time transaction data score fraud indications.
The combined storage of several different types of transaction data offers an enhanced platform for providing many different types of analyses.
For example, the new transaction 1602 may be an Internet banking password change. Upon receipt of the new transaction 1602, the real-time scoring 1604 determines the type of transaction, consults the transaction type-model rules 1608, and selects a model from the pool of models 1606. In this example, a fraud likelihood predictive model is selected. The transaction type-model rules dictate that an authorization model is not selected because for this transaction, there is no monetary transaction to authorize. The real-time scoring applies the fraud likelihood predictive model to generate a score 1610 identifying whether the password change is likely fraudulent. If the password change is likely fraudulent, then the change may be prohibited. If the transaction score 1610 identifies a low likelihood of fraud, then the password change may be allowed.
As another example, the new transaction 1602 may be an ATM withdrawal. The real-time scoring 1604 determines the type of transaction, consults the transaction type-model rules 1608, and selects a fraud model and an authorization model from the pool of models. The authorization model scores the model and provides a score 1610. The fraud model has been trained using data from the enterprise database. The data from the enterprise database has identified the ATM associated with the new transaction 1602 as having a high association with fraud. Thus, this identification from the data in the enterprise database influences the fraud likelihood score 1610 for the new transaction 1602, making the score 1610 higher than it would be without the model training using the enterprise database data.
Transaction data is received, and parsed and checked at 1702. Data may be extracted according to the inputs needed for each of the models relevant to the current transaction 1704, 1706. If the fraud model is to be used, then the fraud model suite 1708 is accessed, the transaction data is scored, and the scored transaction is outputted at 1710. If the credit risk model is to be used, then the credit risk model suite 1408 is accessed, the transaction data is scored, and the scored transaction is outputted at 1710.
A disk controller 1860 interfaces one or more optional disk drives to the system bus 1852. These disk drives may be external or internal floppy disk drives such as 862, external or internal CD-ROM, CD-R, CD-RW or DVD drives such as 1864, or external or internal hard drives 1866. As indicated previously, these various disk drives and disk controllers are optional devices.
Each of the element managers, real-time data buffer, conveyors, file input processor, database index shared access memory loader, reference data buffer and data managers may include a software application stored in one or more of the disk drives connected to the disk controller 1860, the ROM 1856 and/or the RAM 1858. Preferably, the processor 1854 may access each component as required.
A display interface 1868 may permit information from the bus 1856 to be displayed on a display 1870 in audio, graphic, or alphanumeric format. Communication with external devices may optionally occur using various communication ports 1872.
In addition to the standard computer-type components, the hardware may also include data input devices, such as a keyboard 1872, or other input device 1874, such as a microphone, remote control, pointer, mouse and/or joystick.
As additional examples, for example, the systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, internet, combinations thereof, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices. The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.