This disclosure relates generally to processing electronic transactions, and, more specifically, to techniques for training a transaction classifier to classify transactions, e.g., for transaction security.
Fraudulent electronic transactions may cause substantial loss and security vulnerabilities. Transactions identified as fraudulent may be appropriately labeled and used to detect and address subsequent fraudulent transactions. For example, using traditional techniques, a security system may classify transactions using a model that is trained based on pre-transaction information from older transactions for which labels are known.
This specification includes references to various embodiments, to indicate that the present disclosure is not intended to refer to one particular implementation, but rather a range of embodiments that fall within the spirit of the present disclosure, including the appended claims. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation—[entity] configured to [perform one or more tasks]—is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “transaction processing system configured to classify one or more transactions” is intended to cover, for example, a computer system that performs this function during operation, even if it is not currently being used (e.g., when its power supply is not connected). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible.
The term “configured to” is not intended to mean “configurable to.” An unprogrammed mobile computing device, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function. After appropriate programming, the mobile computing device may then be configured to perform that function.
Reciting in the appended claims that a structure is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) for that claim element. Accordingly, none of the claims in this application as filed are intended to be interpreted as having means-plus-function elements. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.
As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless specifically stated. For example, in a computing system having multiple user accounts, the terms “first” and “second” user accounts can be used to refer to any users. In other words, the “first” and “second” user accounts are not limited to the initial two created user accounts, for example. When used herein, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof (e.g., x and y, but not z or x, y, and z).
As used herein, the term “based on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect the determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor and is used to determine A or affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part on.”
As used herein, the term “processing element” refers to various elements configured to execute program instructions (or portions thereof or combinations thereof). Processing elements include, for example, circuits such as an ASIC (Application Specific Integrated Circuit), portions or circuits of individual processor cores, entire processor cores, individual processors, programmable hardware devices such as a field programmable gate array (FPGA), and/or larger portions of systems that include multiple processors, as well as any combinations thereof.
Techniques are disclosed for leveraging post-transaction data to train machine learning models using immature transaction data. “Immature” transactions are those for which relevant labels are not yet known or have not yet been generated, and are typically fairly recent. For example, these transactions may have been completed, but may still be within a review interval, after which they may be marked as proper or fraudulent. In contrast, relevant classifier labels are known for mature transactions. Traditionally, data from immature transactions have not been used to train machine learning classifiers. Further, traditional training techniques typically do not utilize post-transaction data.
In contrast, in disclosed embodiments, post-transaction data for mature and immature transactions is used to generate labels for a subset of high-confidence immature transactions. These labels are then used to train one or more classifiers to classify transactions in a production environment. As discussed in further detail below, this may involve multiple training procedures using different transaction classifiers.
In some situations, the disclosed techniques may improve identification of fraudulent transactions by incorporating immature transactions into classifier training. For example, consider a set of transactions that includes transactions A, B, and C, where transaction A is a fraudulent transaction that is mature (e.g., has a known label) and is relatively old compared to transactions B and C, transaction B is immature and relatively recent, and transaction C is a current transaction being categorized. Using traditional techniques, transaction B would not be used to train a classification model prior to transaction C due to labels being unknown for transaction B. In disclosed techniques, however, post-transaction data for transaction A may be leveraged to generate a label for transaction B, which may be used to train a transaction classifier to classify production transactions, such as transaction C.
Leveraging post-transaction data for older transactions to generate labels for recent transactions that are then used to train a classification model may advantageously allow the trained classification model to classify production transactions (e.g., determine whether transactions are fraudulent) more accurately than traditional techniques. This may allow a security system to initiate security actions for detected fraudulent transactions, such as preventing the transactions from occurring, flagging the transactions for additional review or classification, or prompting additional authentication for the transactions.
Initial transaction classifier 110, in the illustrated embodiment, receives pre- and post-transaction data for a first set of transactions 114 and provides classifications for transactions in the first set to training module 112. In some embodiments, the first set of transactions 114 includes one or more older transactions that are completed a threshold amount of time prior to the current time and for which labels have been generated.
Training module 112, in the illustrated embodiment, compares known training labels 116 for the first set of transactions 114 with the output of classifier 110. Based on the comparison, training module 112 provides feedback to initial transaction classifier 110. In some embodiments, the feedback from training module 112 includes one or more adjusted training weights for classifier 110. For example, classifier 110 may be a neural network that generates output values between 0 and 1 for various transactions and training module 112 may adjust training weights based on the difference between the output values and the labels. In other embodiments, any of various types of feedback control may be implemented to train various classifier types.
As used herein, the term “pre-transaction information” refers to information available to a classifier prior to the relevant pending transaction being complete. Thus, pre-transaction information may include information received after the transaction has been initiated but before the transaction is complete. Pre-transaction information may include, for example, data from transactions that are pending or complete before completion of the current pending transaction and other non-transaction information that is independent of the transaction such as information associated with a user who initiated the transaction (e.g., user activity, user location, etc.). Various pre or post-transaction information may be used (with or without pre-processing) to generate features that are input to classifiers 110, 120, or 130.
As used herein, the term “post-transaction information” refers to information that is not available until after the relevant transaction is complete. In some embodiments, post-transaction information includes data for a transaction that is initiated after the current pending transaction is complete. Additionally, post-transaction information may include non-transaction information such as other user activity. Thus, post-transaction information for a particular transaction may include one or more of the following attributes: activity of a user associated with the particular transaction (e.g., on one or more devices), location information of devices involved in the particular transaction (e.g., transaction source and destination), clicking or scrolling activity of the user, currency amount of one or more transactions following the particular transaction, content of the transaction (e.g., monetary or item-based transaction), user information (e.g., username and password), etc. Post-transaction information may be obtained by an administrator of a transaction security system, for example. Speaking generally, various types of data may be categorized as pre or post transaction data based on when it is obtained. Traditionally, because post-transaction data is not available for live transactions being classified, post-transaction data has not been used to train machine learning classifiers.
Trained initial transaction classifier 120, in the illustrated embodiment, receives both pre- and post-transaction data for a second set of transactions 122 (e.g., a set of immature transactions). Classifier 120 then generates classifier outputs 124 that include classification values for transactions in the second set of transactions 122. In some embodiments, a filtering module may filter through classifier outputs 124 to determine a subset of the second set of transactions. In some embodiments, the subset includes transactions in the second set of transactions with classifier output values that satisfy a confidence threshold. For example, the output values for these transactions may be within a threshold difference from one or more expected classifier values.
Second transaction classifier 130, in the illustrated embodiment, receives pre-transaction data for transactions in the subset of the second set of transactions that have a threshold confidence. Based on this input, second transaction classifier 130 generates classifier output values and sends them to training module 132. In some embodiments, classifiers 110 and 130 are trained using similar machine learning techniques. For example, one or more of the following machine learning techniques may be used to train classifiers 110 and 130: neural networks, ensemble methods, regression (e.g., linear or logistic), clustering (e.g., k means), classification (e.g., naïve Bayes), etc.
Training module 132, in the illustrated embodiment, receives classifier labels 136 generated for high-confidence transactions included in the subset 134 of the second set of transactions. In some embodiments, a filtering module (e.g., module 322 discussed below) or some other module generates labels for high-confidence transactions based on the output values of trained initial transaction classifier 120. In the illustrated example, training module 132 compares the classifier output values from classifier 130 with the labels 136 for high-confidence transactions. Training module 132 provides training feedback to classifier 130 including adjustments to training weights.
Note that, although the second transaction classifier 130 receives data only for the subset of the second set of transactions in the illustrated embodiment, this classifier may be trained based on various other types of training data in addition, e.g., pre-transaction data from the mature transactions used to train the initial transaction classifier 110. As discussed above, the disclosed techniques may allow second transaction classifier 130 to be trained based on immature transactions, which may improve its accuracy relative to traditional techniques, e.g., by incorporating data from malicious trends earlier than traditional techniques.
Pre- and post-transaction information used to train transaction classifiers and to classify one or more electronic transactions may be obtained from specific time intervals for a particular classification system. For example, a training system may obtain mature transaction data from an earlier time interval than immature transaction data. Note that the specific time intervals from which training data is obtained may vary in length, depending on the training or classifying being performed, transaction volume, etc.
Interval 210A, in the illustrated example, includes mature transactions for which training labels are available and interval 210B includes immature transactions for which training labels are not available. Note that intervals 210 may include any number of transactions and that a particular user or account initiating a transaction in interval 210A may also initiate a transaction in interval 210B. In some embodiments, a training system selects interval 210A such that it is a threshold distance in time from interval 210B and such that labels are available (e.g., mature) for transactions within interval 210A. For transaction 212A, within interval 210A, post-transaction data is shown as information that is available within an interval 214A that extends from when transaction 212A is initiated to the current time 220. Similarly, post-transaction data 214 for transaction 212B is shown as information that is available from when transaction 212B is initiated to the current time 220.
In some embodiments, post-transaction data within interval 214A for transaction 212A used for training is limited to include only transaction information from a similar length of time as the interval 214B from which post-transaction data is available for transaction 212B (e.g., the post-transaction data for transaction 212A is selected from a smaller time interval than that shown in in the illustrated example). Note that
In the illustrated example, initial transaction classifier 110 receives pre- and post-transaction data for transactions in interval 210A (including transaction 212A) and provides classifier output values to training module 112. Classifier 110 receives control signaling from training module 112 based on training labels for transactions in interval 210A. Once classifier 110 satisfies a training threshold, it is referred to as trained initial transaction classifier 120, in the illustrated example. Trained initial transaction classifier 120 receives pre- and post-transaction data for a transaction in interval 210B and provides classifier output to filtering module 322.
Filtering module 322, in the illustrated embodiment, determines classifier output values that satisfy a threshold confidence (these values are associated with a high classifier confidence) and selects a subset of transactions in interval 210B. For example, trained initial transaction classifier 120 may output values between 0 and 1. In this example, classifier output values within the range 0-0.2 and 0.8-1 may meet a confidence threshold, and transactions associated with these high-confidence output values may be included in the subset of transaction selected by filtering module 322. In some embodiments, the filtering module 322 generates labels for transactions in the subset based on the classification values for the transactions. For example, for a particular transaction whose classifier output is 0.2, filtering module 322 assigns a label of 0 to the transaction. Based on the subset of transactions in interval 210B, leveraged transaction classifier 330 receives pre-transaction data for transactions in the selected subset.
Leveraged transaction classifier 330, in the illustrated embodiment, sends classifier output to training module 132 that includes classification values for the subset of transactions. Training module 132 sends training feedback to leveraged transaction classifier 330 based on labels generated for high-confidence transactions (included in the subset selected by filtering module 322). Note that leveraged transaction classifier 330 is one example of the second transaction classifier 130 shown in
In the illustrated embodiment, an arrow showing potential time intervals in which pre-transaction data 222 may be available for transactions that are initiated at the current time 220. Note that all or a portion of this transaction data may be selected for classifying, using the leveraged transaction classifier 330, one or more transactions that are initiated at or after the current time 220. For example, a portion of the pre-transaction data 222 extending from transaction 212B to transaction 212A may be used by leveraged transaction classifier 330 to classify transactions. Leveraged transaction classifier 330 may be updated periodically, in some embodiments, using transactions from updated time intervals.
Note that various examples herein classify transactions as fraudulent or not, but these examples are discussed for purposes of explanation and are not intended to limit the scope of the present disclosure. In other embodiments, any of various classifications may be implemented.
Trained leveraged transaction classifier 410, shown in the illustrated example, is one example of leveraged transaction classifier 330, e.g., that has been trained and satisfies one or more training thresholds.
Traditional transaction classifier 420, shown in the illustrated example, is one example of a machine learning model that has been trained using traditional techniques (e.g., without using post-transaction data). Training of classifier 420 may include using pre-transaction data for transactions for which training labels are known. For example, these transactions are typically older transactions relative to a current time, such as those included in interval 210A, shown in
In the illustrated example, ensemble module 430 receives classifier outputs 406 from classifiers 410 and 420. Based on these outputs 406, module 430 generates one or more classification outputs 408 using one or more ensemble methods. Ensemble module 430 may, for example, analyze the outputs of multiple classifiers and aggregate them to produce an increase in correctly identified classifier outputs relative to the classifier outputs of individual classifiers. For example, a classifier training system may use one or more of the following ensemble methods to combine classifier outputs from two or more of the same or different transaction classifiers: random forest models, bootstrap aggregating, boosting (e.g., Adaboost), Bayesian parameter averaging, Bayesian model combination, etc. The classification output(s) 408 generated by ensemble module 430 may advantageously increase correctly identified transactions relative to labels generated by traditional classifiers, such as classifier 420, for example.
Using traditional transaction classification techniques may identify fraudulent transactions that follow a trend only after a certain time interval (e.g., once transactions that match the trend have matured). For example, labels for transactions are often not available until a month or three months after the transactions occur. Therefore, traditional classification techniques may have a ramp-up time in which fraudulent transactions are not detected. Using the disclosed multi-classifier techniques to leverage post-transaction data for older transactions to generate labels for more recent immature transactions may advantageously allow a security system to detect additional fraudulent transactions that follow an identified trend.
In the illustrated embodiment, interval 510B is between two and four weeks prior to time 520T. Similarly, interval 510A is between two and four weeks prior to 520S and is the same length as interval 510B. In some embodiments, intervals 510A and 510B are different lengths. In the illustrated example, post-transaction data for two different example transactions 512A and 512B, as shown, are selected from intervals of time that are the same length. An arrow illustrating an interval in which pre-transaction data for transaction 512B, however, is shown to span a longer potential length of time than the pre-transaction data associated with transaction 512A. In some embodiments, pre-transaction data for transaction 512B are selected from an interval with the same length as a time interval from which pre-transaction data for transaction 512A are selected. Note that transaction 512B is included in a high-confidence subset of transactions within interval 510B.
In one example situation, a user may create an account and complete a first transaction on May 12th. Using the same account, the user completes a second transaction on May 14th. On June 12th a transaction security system initiates a chargeback process for the first transaction and marks this transaction as fraudulent. Using the same account, the user completes a third transaction on June 24th. In this example, characteristics of the third transaction and user activity related to the third transaction are similar to the second transaction. On July 27th, the transaction security system initiates a chargeback process for the second transaction and marks the transaction as fraudulent. Using traditional classification techniques, in this example, the transaction security system may not be able to identify a trend in the first two transactions and classify a third transaction, initiated by this same user, as fraudulent prior to the third transaction being complete. Using disclosed techniques, however, the system may identify the trend of the first two transactions and predict that a third transaction will also be fraudulent, allowing the transaction security system to block subsequent transactions initiated by the user account or other accounts. In this example, the transaction security system may be able to label the second transaction as fraudulent earlier (e.g., prior to June 24th), using the new classifier model and may, therefore, identify the third transaction as fraudulent based on the second transaction being fraudulent.
At 610, in the illustrated embodiment, a computer system trains an initial transaction classifier based on pre-transaction data and post-transaction data for a first set of transactions for which training labels have been generated.
At 620, the computer system inputs, to the trained initial transaction classifier, pre-transaction data and post-transaction data for a second set of transactions for which training labels have not been generated, where the trained initial transaction classifier generates classifier outputs based on the input. In some embodiments, transactions in the first set of transactions occur during a first time interval and transactions in the second set of transactions occur in a second time interval that does not overlap with the first time interval, where the second time interval is later in time than the first time interval. In some embodiments, transactions in the second set of transactions occur in a second time interval that begins at least a month after the end of the first time interval. In some embodiments, a transaction in the first time interval occurs at least one month prior to the current time.
In some embodiments, the post-transaction data for the first set of transactions is selected from a first time interval whose length is determined based on a time difference between a transaction in the second set of transactions and the current time. In some embodiments, the first time interval and the second time interval are the same length. For example, post-transaction data used to train the initial transaction classifier and post-transaction data used to train the second classifier are selected from two different time intervals whose lengths are the same.
In some embodiments, pre-transaction data for the first and second set of transactions includes account credentials for an account associated with one or more transactions in the first and second set of transactions, where post-transaction data for at least a first transaction in the second set of transactions includes activity of a user of the account subsequent to the first transaction being complete. In some embodiments, pre-transaction data for at least a first transaction in the second set of transactions includes transaction data associated with one or more transactions that were initiated prior to the first transaction in the second set of transactions, where post-transaction data for at least a first transaction in the second set of transactions includes location information of a user device that initiated the first transaction subsequent to the first transaction being complete. For example, post-transaction data used to generate classifier outputs using the trained initial transaction classifier may include geofencing information for a device of a user who complete transactions being classified by the trained initial transaction classifier. As another example, pre-transaction information for a particular transaction may include information associated with any number of transactions initiated prior to the particular transaction.
At 630, the computer system selects a subset of the second set of transactions whose classifier outputs meet a confidence threshold. For example, classifier outputs that are between 0.8 and 1, and 0 and 0.2 may satisfy the confidence threshold and transactions associated with these outputs may be included in the subset.
At 640, the computer system generates training labels for transactions in the selected subset based on their classifier outputs. In some embodiments, the training labels specify whether transaction in the selected subset are fraudulent.
At 650, the computer system trains a second transaction classifier based on pre-transaction data for the selected subset and the generated training labels. In some embodiments, training the second transaction classifier does not include training based on post-transaction data. In some embodiments, training the second transaction classifier is performed using one or more supervised machine learning techniques. In some embodiments, at least fifty percent of the post-transaction data for transactions in the selected subset is not used to train the second transaction classifier. For example, the second transaction classifier may be used to classify transactions for which post-transaction data is limited or does not exist. Therefore, in this example, training the second transaction classifier is performed with little or no post-transaction data.
At 660, the computer system stores configuration parameters for the trained second transaction classifier. In some embodiments, a transaction processing system classifies, subsequent to the storing, one or more transactions using the trained second transaction classifier. In some embodiments, the one or more transactions are initiated after transactions in the second set of transactions are complete.
In some embodiments, the trained second transaction classifier is usable to predict whether transactions received by a production transaction computer system are fraudulent. In some embodiments, the computer system generates final classifier outputs based on classifier outputs from a plurality of trained transaction classifiers. In some embodiments, the plurality of trained transaction classifiers includes the trained second transaction classifier and a third transaction classifier that is not trained using post-transaction data. For example, the computer system may ensemble the trained second transaction classifier and a traditional transaction classifier (e.g., one that is not trained using post-transaction data) using one or more ensemble methods to generate final classifier outputs.
Turning now to
In various embodiments, processing unit 750 includes one or more processors. In some embodiments, processing unit 750 includes one or more coprocessor units. In some embodiments, multiple instances of processing unit 750 may be coupled to interconnect 760. Processing unit 750 (or each processor within 750) may contain a cache or other form of on-board memory. In some embodiments, processing unit 750 may be implemented as a general-purpose processing unit, and in other embodiments it may be implemented as a special purpose processing unit (e.g., an ASIC). In general, computing device 710 is not limited to any particular type of processing unit or processor subsystem.
As used herein, the term “module” refers to circuitry configured to perform specified operations or to physical non-transitory computer readable media that store information (e.g., program instructions) that instructs other circuitry (e.g., a processor) to perform specified operations. Modules may be implemented in multiple ways, including as a hardwired circuit or as a memory having program instructions stored therein that are executable by one or more processors to perform the operations. A hardware circuit may include, for example, custom very-large-scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like. A module may also be any suitable form of non-transitory computer readable media storing program instructions executable to perform specified operations.
Storage subsystem 712 is usable by processing unit 750 (e.g., to store instructions executable by and data used by processing unit 750). Storage subsystem 712 may be implemented by any suitable type of physical memory media, including hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM-SRAM, EDO RAM, SDRAM, DDR SDRAM, RDRAM, etc.), ROM (PROM, EEPROM, etc.), and so on. Storage subsystem 712 may consist solely of volatile memory, in one embodiment. Storage subsystem 712 may store program instructions executable by computing device 710 using processing unit 750, including program instructions executable to cause computing device 710 to implement the various techniques disclosed herein.
I/O interface 730 may represent one or more interfaces and may be any of various types of interfaces configured to couple to and communicate with other devices, according to various embodiments. In one embodiment, I/O interface 730 is a bridge chip from a front-side to one or more back-side buses. I/O interface 730 may be coupled to one or more I/O devices 740 via one or more corresponding buses or other interfaces. Examples of I/O devices include storage devices (hard disk, optical drive, removable flash drive, storage array, SAN, or an associated controller), network interface devices, user interface devices or other devices (e.g., graphics, sound, etc.).
Various articles of manufacture that store instructions (and, optionally, data) executable by a computing system to implement techniques disclosed herein are also contemplated. The computing system may execute the instructions using one or more processing elements. The articles of manufacture include non-transitory computer-readable memory media. The contemplated non-transitory computer-readable memory media include portions of a memory subsystem of a computing device as well as storage media or memory media such as magnetic media (e.g., disk) or optical media (e.g., CD, DVD, and related technologies, etc.). The non-transitory computer-readable media may be either volatile or nonvolatile memory.
Although specific embodiments have been described above, these embodiments are not intended to limit the scope of the present disclosure, even where only a single embodiment is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2019/119675 | Nov 2019 | CN | national |
The present application claims priority to PCT Appl. No. PCT/CN2019/119675, filed Nov. 20, 2019, which is incorporated by reference herein in its entirety.