In many financial and industrial applications, there is a need to detect and respond to rare events as they occur in the time-series behavior of entities. Even with a relatively large team of investigators, only a small fraction of entities can be explored, due to the expense of time and resources for investigation and remediation.
In some implementations, the current subject matter relates to a computer-implemented for detecting a diverse set of rare behavior. The method may include processing, using at least one processor, a time-series data received from a plurality of time-series data sources. The time-series data may represent one or more actions executed by an entity in a plurality of entities and stored by at least one time-series data source in the plurality of time-series data sources. The method may further include generating a data structure corresponding to the entity. The generated data structure may identify the entity and include one or more representations of processed time-series data identifying one or more actions executed by the entity. A current action executed by the entity may be detected and one or more current time-series data corresponding to the current action and associated with data structure corresponding to the entity may be received. The method may also include extracting one or more first features from the generated data structure based on one or more current time-series data, comparing one or more extracted first features and one or more second features extracted for at least another entity in the plurality of entities, and determining, based on the comparing, one or more difference parameters being indicative of differences between selected one or more first and second features. The may further include training one or more models, using the one or more difference parameters, and determining, using the one or more trained models, a score for each of the one or more actions executed by the at least one entity, identifying at least one action in the one or more actions based on the determined scores, and updating the training of one or more models in response to receiving a feedback data responsive to the identified at least one action, and identifying at least another action in one or more actions.
In some implementations, the current subject matter can include one or more of the following optional features. At least one of the first features and the second features may include one or more latent features. The training of the models may be performed using the selected first and second features.
In some implementations, the training may include selecting at least one over- and under-representation of a training exemplar or no change to representation.
In some implementations, the feedback data may include feedback data responsive to a utility of the identified at least one action.
In some implementations, the processing may include monitoring the actions executed by the entity, and receiving the time-series data from the plurality of time-series data sources. The actions, behaviors and/or state of the entity may be summarized by one or more representations and may include at least one previously executed action (e.g., historical actions by the entity).
In some implementations, the time-series data may be received during at least one of the following time periods: one or more periodic time intervals, one or more irregular time intervals, and any combination thereof. The time-series data may represents one or more actions executed by the entity during a predetermined period of time.
In some implementations, at least one entity and at least another entity may include at least one of the following: related entities, unrelated entities, and any combination thereof.
In some implementations, one or more difference parameters of the representations may include at least one of the following: latent parameters determined for least comparable entities, parameters determined for most comparable entities, and any combination thereof. This may include a diversity metric for least/most likely entities.
In some implementations, at least another identified action may include at least one of the following: an action identified in addition to the at least one identified action, an action identified for replacing at least one identified action, no action, and any combination thereof (e.g., feedback requests for actions “TriggerRequestMore”, “TriggerRequestLess”, etc.).
In some implementations, the updating may include assigning one or more weight parameters to at least one of: at least one entity and one or more actions executed by the entity, and generating an updated model and an updated score for each of the actions executed by the entity based on the weight parameters. The weight parameters may be determined based on at least the received feedback data. In some implementations, the received feedback data may include one or more labels associated with at least one of: at least one entity and one or more actions executed by the at least one entity. The weight parameters may be determined based on a number of times the feedback data is received for at least one of: the entity and at least another entity being similar to the entity and determined to be within a predetermined distance of the entity. The received feedback data may include feedback data associated with at least another entity being similar to the entity. The received feedback data may include an aggregate feedback data associated with at least one entity and at least another entity being similar to the entity. The feedback data may include a feedback data associated with one or more actions executed by at least one of: at least one entity and at least another entity being similar to the entity. One or more actions may include at least one of the following: at least one identified action, an action identified for replacing the identified action, no action, and any combination thereof.
In some implementations, the method may include generating a consistency score one or more of the investigative users of the system, the consistency score being determined based on receiving a number of times a similar feedback data for at least one of: at least one entity, at least another entity being similar to the entity and determined to be within a predetermined distance of the entity, and one or more actions executed by at least one of: at least one entity and at least another entity being similar to the entity, and any combination thereof, and determining, based on the generated consistency score, whether to use the received feedback data in the updating.
In some implementations, the method may include repeating at least one of the processing, the generating, the detecting, the extracting, the comparing, the training, the identifying, and the updating based on the received feedback data.
Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g., the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
In some implementations, the current subject matter may be configured to provide an efficient solution that may combine machine-learned models, and automatically incorporate feedback from multiple investigators, in order to find and action on a diverse set of rare, outlier events. The current subject matter may also provide capabilities for supervisors of teams of investigators to obtain feedback on performance to improve quality, consistency, investigator training and bias detection.
Some example applications may include financial crime prevention (e.g., fraud and money laundering) and industrial machine failure detection. In these uses, there is a typically a cost for each entity that is investigated due to being over a threshold score, which has to be balanced with the cost of a missed detection. For example, in the case of money laundering, each case over a threshold requires human investigation, while minimizing the chance of missing criminal behavior. If only the most extreme 0.1% of cases can be investigated, it may be important to ensure that there are no missing important types of criminal behavior (e.g., the algorithm only places unusual international activity in the top 0.1%, missing unusual cash activity). Similarly in the machine failure case, a decision may be made to replace a part for a device with high likelihood of failure, and a threshold on the model will be used to make that decision. It is important to ensure that investigation does not only include a small set of system components (e.g., too many of the alerts indicate failure on one valve or motor).
In some implementations, the current subject matter may be configured to implement an unsupervised anomaly score, and focus investigation on a diverse and wide variety of outlier behaviors. Users of the current subject matter system may provide feedback on which types of new cases are interesting, and the system may adjust automatically to present more relevant entities for investigation. Eventually, a large enough sample of new types of outlier behavior may found, and they may be incorporated into the training set of a supervised machine learning model. The goal of the supervised model may be to very efficiently find entities of interest, lowering false positive rates, for those types of behavior which are well known (and statistically large samples can be collected of each for training). Examples of new types of behavior might include high-velocity online gambling activity (in the AML domain), or a new failure mode of a motor or valve (in the machine failure prediction domain). The current subject matter system may provide a way to explore the topological space of behavior, direct investigation to those parts of the space that are of interest (rare, abnormal behavior), and efficiently find entities with similar behavior.
The components of the system 100 may include any combination of hardware and/or software. In some implementations, such components may be disposed on one or more computing devices, such as, server(s), database(s), personal computer(s), laptop(s), cellular telephone(s), smartphone(s), tablet computer(s), and/or any other computing devices and/or any combination thereof. In some implementations, these components may be disposed on a single computing device and/or can be part of a single communications network. Alternatively, or in addition to, the components may be separately located from one another.
The engine 104 may be configured to execute one or more functions associated with detection of diverse behavior. Such functions may be executed automatically, e.g., upon detection of a trigger (e.g., receipt of data associated with existing, new, etc. transactions), and/or manually. The devices 102 may refer to user device, entities and/or corresponding to entities, which may be users, applications, functionalities, computers, records, input data records, data structures, and/or any other type of information, data, device, etc. (which may be referred in the following description as “user devices”). In exemplary implementations, some device(s) 102 may be configured to issue queries, receive various results, provide feedback, and/or perform other functionalities associated with the process of detection of diverse behavior. The devices 102 may be equipped with video, audio, file sharing, user interface (e.g., screen) sharing, etc. hardware and/or software capabilities as well as any other computing and/or processing capabilities. The database(s) 106 may be configured to store various data (e.g., time-series data, and/or any other data) that may be accessed for the purposes of determining diverse behaviors.
In some implementations, the engine 104 may be configured to process a time-series data that may be received from a plurality of time-series data sources, such as, for example, database 106 and/or one or more of the user devices 102. The time-series data may represent one or more actions executed by an entity, such as one or more of the user devices 102. The database(s) 106 may be configured to store such time-series data. The time-series data may represent various input records indicative of entity's behavior (e.g., applying for credit, transferring funds, opening credit accounts, etc.).
Each entity may be associated with a particular entity profile or data structure. Such entity profile may be generated by the engine 104 and/or any other computing component. The profile may be configured to identify the entity and include one or more representations (e.g., generated by the engine 104) of the processed time-series data or historical behavior that may identify one or more actions executed by the entity.
The engine 104 may be configured to detect a current action (e.g., opening of a new credit account, commission of a fraudulent action, etc.) that may be executed by the entity. Any time-series data corresponding to such current action may be transmitted to the engine 104 and may be used to make updates to a data structure corresponding to the entity, thereby forming a concise profile of entity behavior. The engine 104 may be configured to analyze the time-series data associated with the current action for the purposes of determining whether the current action does not fit within the pattern of behavior of the entity, e.g., whether the current action corresponds to an outlier action.
In some implementations, the engine 104 may extract one or more features from the generated data structure, i.e., the entity profile, based on one or more current time-series data. Specifically, the features may be extracted from the entity profile and the input data associated with the current action. The extracted features of the entity (e.g., user 1 corresponding to and/or being associated with a user device 102a) may be compared to extracted features of another entity (e.g., user 2 corresponding to and/or associated with a user device 102b). Based on the comparison, difference parameters indicative of the differences between one or more first and second features may be determined. The features may be appropriately selected for the comparison. Using comparison, the engine 104 may determine one or more distances and/or diversities between various entities.
The engine 104 may include one or more machine learning components and may perform training of one or more models using the difference parameters, during which a regularization may be applied to emphasize and/or de-emphasize certain portions, parts, regions, etc. of an input data space based on their diversity. Based on the trained models, a score for each of the actions executed by one or more entities may be determined for the purposes of determining outliers. The scores may help identify questionable actions (e.g., fraudulent, suspicious, etc. actions, activities) by the entities.
In some implementations, the engine 104 may be configured to update training of the models in response to receiving a feedback data (e.g., from one or more investigator-users, e.g., associated with a device 102c, that may correspond to one or more investigators, analysts, etc. (e.g., human users, and/or processors) reviewing actions by entities) received in response to the identified actions. The feedback may be used to identify other actions that may be similar. The user device 102c may include a user-interface 103 that may be used by one or more investigators to make various decisions about identified actions, and provide feedback which may be used to improve the quality of the detection process.
For each entity monitored by the system 100, their behavior may be observed using one or more input records. The system 100 may execute a behavioral profiling process that may receive data associated with each input record and update a behavioral profile of an entity (e.g., in persistent storage and/or database 106) based on a current event (e.g., data received in connection a new transaction executed by an entity (e.g., a purchase, opening of new line of credit, transfer of funds, etc.). The behavioral profile may include a concise, efficient representation of a time-series history of the entity's behavior. It may be efficient as compared to storing and/or retrieving all (or many) of the input records.
In some implementations, in connection with the entity's profile, a feature vector xi may be constructed as a function of the input record and an entity's profile, so that xi may include information about the current event as well as a historical state. For unsupervised learning, a machine learning model (that may be part of the engine 104) may be configured to estimate a probability p(xi) of that entity's behavior. The engine 104 may be further configured to estimate a density of events that may be associated with an entity behavior.
The engine 104 may be configured to use, for instance, a classifier adjusted density estimation (CADE), to determine density. Using CADE, a supervised classifier and base density estimate may be combined to generate an estimate {circumflex over (p)}(xi) of a true density p(xi). A base density S may be easily sampled from (e.g., with independence assumptions) to generate {circumflex over (p)}(xi|S). A supervised classifier, e.g., a neural network, may be trained to distinguish between data that may be received from an original population T and/or from the base density S. The CADE estimation of the probability of the feature vector may be expressed as follows:
where, {circumflex over (p)}(T|xi) is the classifier estimate that xi was drawn from the observed data T, and {circumflex over (p)}(S|xi) is the classifier estimate that xi was drawn from the base density S.
The engine 104 may be configured to determine a score to distinguish common behavior of an entity from a rare behavior using the above learned density estimate. In particular, the engine 104 may be configured to determine an OutlierScore—a function of the CADE estimate of the probability (as well as other components of the representation vector), using the following:
OutlierScorei=ƒ(1−{circumflex over (p)}(xi|T), xi ), ∈[0, 999] (2)
The OutlierScore may be calibrated so that low-likelihood behavior is assigned higher scores (maximum of 999) and high-likelihood behavior to lower scores (minimum of 1). By knowing the score distribution on historical data, the engine 104 may select a score-threshold to identify the most important behavior, while limiting a number of entities that need to be reviewed by the investigative user teams. Entities with scores greater than this threshold may be referred to as alerts, and these alerts may be investigated by one or more of the investigator users (e.g., using an investigator UI 103 of user device 102c). Certain risky activity and/or failure modes may be known, and/or codified into rules by expert judgement and/or trained into the supervised learning model. When these rules are triggered, the engine 104 may generate additional alerts, which also may be investigated.
One of the issues with existing approaches to the above problem is that the types of rare events, outliers and anomalies found may not produce a wide enough diversity of entities to examine, and so not cover all the behavior that should be investigated and remediated. The current subject matter system 100, as discussed in further detail below, may be configured to resolve these issues, such as, for example, by providing one or more metrics to determine diversity, and solutions to address, analyze and/or investigate diversity behavior during model training, as well as during on-line operations. Users (e.g., user of device 102c) may interact with the system 100 using user interfaces (e.g., investigator UI 103 of the device 102c) by investigating alerts, some of which may be opened into cases, which may pass through multiple levels of investigation. Cases may eventually be decided to be either normative (e.g., no further action needed), or confirmed as needing action. Further actions may include, for example, but not limited to, reporting the case to a regulatory authority in anti-money laundering, replacing a component in machine failure prevention, etc.
In some implementations, the system 100 may be used in operations where there is already a legacy system which may have generated alerts based on a relatively simple set of rules that monitor entity behavior. Investigator users may have formed decisions about a set of entities (e.g., labelling them as “good” or “bad”). These labels may be used in a semi-supervised learning process discussed below.
At 316, the system 100, such as, one or more user devices 102c and/or engine 104, may be configured through a user-interface (e.g., investigator UI 103 of the user device 102c) to review any alerts that have been generated and/or transactions associated with the entities that have been placed into queues (at 312 and 314). If no further actions are warranted on some such alerts/transactions, any further investigation may be closed, at 318. Otherwise, if “interesting” activities are determined, the engine 104 and/or user 102c may escalate the alert/transaction (and/or account associated with the alert/transaction) further, at 320.
As a result, the system 100 may be configured to perform a secondary review of such alerts/transactions that have been identified, at 322. If no further actions are warranted on some such alerts/transactions, any further investigation may be closed, at 324. Otherwise, the system 100 may confirm that these alerts/transactions warrant further review, at 326, and additional investigation of details of such alerts/transactions may be necessary to obtain a resolution, at 328.
In some implementations, the system 100 may be configured to execute a determination and/or measurement of diversity of behavior of a particular entity as compared with a set of other entities.
Referring to
The parameters of the embedding may be learned using a neural network and/or using a probabilistic model, such as, for example, latent Dirichlet allocation. The dimensionality of such an embedding may typically be 10-20, which may be a large reduction from the hundreds of “words” possible. At model scoring time, an inference algorithm may be executed to update the embedding for each entity, and may incorporate it into the representation vector xi.
Once the representation of time-series of entity behavior has been determined for each entity, the system 100 may be configured to execute a pairwise similarity of behavior, at 404. The system 100 may be configured to compare a pair of entities. For example, the system 100 may use an Euclidean distance as a metric. However, such metrics may be associated with various difficulties in high-dimensional space. For example, the Euclidean distance between two arbitrary vectors may tend to converge as the dimensionality gets higher. The use of alternative metrics may alleviate this problem, such as, the Lp family of metrics, which have shown benefits for p<1 in such spaces. The Lp norm as a metric for the distance between two representation vectors xi, xj, may be expressed as follows:
L
p(xi, xj)=∥xi−xj∥p=(Σk|xk,i−xk,j|p)1/p (3)
Based on exemplary, non-limiting, experimental implementations of the current subject matter for a variety of P values, p=2.0 (Euclidean distance), p=1.0, and p=0.1, it was determined that the p=1.0 and p=0.1 provided intuitively better distance measures than p=2.0 (Euclidean) distance, which undesirably often reported high distances between behavior that was quite similar.
Once the distance is determined, the system 100 may be configured to execute a determination of diversity of behavior among entities, at 406. In some implementations, the distance measure provided by equation (3) may be used to determine how similar or diverse the behavior of sets of entities is. For a particular entity xi and a set of related entities H, the diversity may be determined using the following:
If the behaviors of entities in H are very different from each other, the diversity measure will be higher for each entity xi, as compared with other sets of entities with more similar behavior. Entities with high diversity values may be farther away from their neighbors than entities with low diversity. Determining the pairwise distances in the diversity measure may be time-consuming if the set H is large. To avoid the O(n2) computational burden of calculating the metric on all data points, the system 100 may be configured to restrict such determination to a subset of entities, e.g., based on a model score. Further, the diversity metric may be determined on the least likely entities (e.g., based on high scores), and/or the most likely entities (e.g., based on low scores).
A visualization of distance and diversity, at 408, may follow the determination of diversity of behavior. In some implementations, the device 102c of the system 100 may be configured to visualize (e.g., via user interface 103) the requested entity distance from each other using a low-dimensional representation. Such low-dimensional representations may be constructed with the T-distributed stochastic neighbor embedding (t-SNE) to convert similarities in a vector space of data points to probabilities and attempt to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. T-SNE defines the joint probabilities Pij that measures similarities between Xi and Xj using the following:
Calculation of t-SNE may be executing using a standard deviation σ in such a way that the perplexity of Pi equals a user predefined perplexity. Once the 2-dimensional components for each entity are determined, a 2-dimensional (2D) plot may be generated and displayed on a user interface of the device 102c to visualize all (and/or a subset) of the analyzed entities.
In some implementations, the system 100, and in particular, for example, engine 104 may be configured to execute training to increase system 100's response to diverse behavior.
To adjust for low diversity in high-scoring population, at 502, the system 100 may be configured to adjust an objective during training time to increase the diversity of the elements that may be found as the highest scoring (e.g., most extreme). This may be interpreted as a regularization of the outlier space to “flatten” and/or normalize the density estimate in the low-probability regions of the domain. In some implementations, the engine 104 may be configured to perform a training-time optimization process to enhance diversity of the outlier population, which may be executed based on finding the distances between elements which are determined to be outliers after some amount of training. The diversity may be a function of the pairwise distances between elements in the set of outliers. Elements with higher diversity factors may on average be further away from most of the other outliers, and may be encouraged to rank relatively higher among the set of outliers.
In particular, as part of the optimization process, after sufficient training of the probability density estimator has occurred (epochs>M), the engine 104 may be configured to determine a set H of highest scoring (least likely) elements and determine one or more pairwise distances between elements xi in set H. Distance may be the Lp(xi, xj) metric from equation (3) and expressed as follows: distance(xi, xj)=Lp(xi, xj). Then, for each element xi in H, the engine 104 may construct Diversity(xi), which is a function of all the pairwise distances in H, e.g., Diversity(xi){=Σj≠idistance(xi, xj), and determine the subset H′ of H which has the lowest diversity, H′=xi∈H|Diversity(xi)<−2σ} where σ is the standard deviation of the distribution of Diversity.
Subsequent to the determination of lowest diversity in the optimization process, the engine 104 may be configured to optimize probability density estimator with an additional regularization as function of Diversity. When the CADE approach is used (ŷ is a neural network approximation to the probability that xi is drawn from the true vs base density), the cost function becomes, J=ΣiRi*(ŷl−yi)2, where the regularization factor Ri is,
and γ1 may be selected so that Ri>1 for those low-diversity entities where reduction of score is desired. The probability density estimation in the optimization process may include an unsupervised learning algorithm. For example, for CADE, elements with higher diversity factor having a lower estimate P(x|T) may be of interest, so those samples may be weighted less in training by using a low γ1 in the regularization. They may be considered less likely by the model (i.e., higher scoring in applications where higher scores indicate more anomalous samples).
Referring back to
In some exemplary, experimental implementations, application of both optimization and regularization processes to the AML domain, a number of top-scoring customers who had triggered SAR filings increased from 63/250 (Baseline) to 140/250 (with both processes), or an increase of 122%.
The regularization process to reduce the impact of low-scoring, low-diversity customers on the estimation of outliers may be initiated after sufficient training of the probability density estimator has occurred (epochs>M). In particular, the engine 104 may determine a set G of lowest scoring (most likely) elements and determine one or more pairwise distances between elements xi in set G. Distance may be the Lp(xi, xj) metric from equation (3) and expressed as: distance(xi, xj)=Lp(xi, xj). For each element xi in G, the engine 104 may determine Diversity(xi), which is a function of all the pairwise distances in G, e.g., Diversity(xi)=Σj≠idistance(xi, xj). Then, the engine 104 may determine a subset G′ of G which has the lowest diversity, using G′={xi∈G|Diversity(xi)<−2σ}, where a is the standard deviation of the distribution of Diversity.
The next operation in the regularization process may include optimizing probability density estimator with an additional regularization as function of Diversity. When the CADE approach is used (ŷ is a neural network approximation to the probability that xi is drawn from the true vs base density), the cost function becomes, J=ΣiRi*(ŷl−yi)2, where the regularization factor Ri is,
and γ2 may be selected so that Ri<<1 for those low-diversity entities where it may be desirable to de-emphasize those to the CADE neural network.
In some implementations, the current subject matter system may be configured to incorporate user feedback in the training process to enhance diversity. In a rare event detection, class labels of some data may be known, due the investigators providing feedback on earlier alerts (either generated by the unsupervised score, or a rules-based system). For data known to be from the rare class, it may be desired to have similar data to be modeled as low likelihood. In this semi-supervised case, the distance metric may be used to find data nearby those rare classes, and regularize training to have similar entities score higher. The labeled samples might not be directly observed during the unsupervised model estimation.
The engine 104 may be configured to execute a semi-supervised approach to using a small amount of labeled data to enhance the diversity of outliers found by the model. Using this approach, previously labelled entities are referred to as “bad” when they have been dispositioned as important (e.g., such as, an entity who had SAR filed in AML, and/or a machine that has been confirmed to fail).
The engine 104 may be configured to execute the semi-supervised approach after sufficient training of the probability density estimator has occurred (epochs>M). In this case, the engine 104 may determine a set H of highest scoring (least likely) entities, determine a set B of previously labeled bad entities, and determine one or more pairwise distances Lp(xi, xj) between entities in set H and entities in set B. Then, for each element xi in H and xj in B, the engine 104 may determine a minimum distance to a bad entities as minDistToBad(xi)=min(Distance(xi, xj)), and determine a set H′ which is closest to the bad entities, as H′={xi∈H|minDistToBad(xi)<−σ}, where a is the standard deviation of the distribution of minDistToBad.
Subsequently, the engine 104 may optimize probability density estimator with an additional regularization as function of Diversity. When the CADE approach is used (ŷ is a neural network approximation to the probability that xi is drawn from the true vs base density), the cost function may becomes J=ΣiRi*(ŷl−yi)2, where the regularization factor Ri may be expressed as
and γ3 may be selected so that Ri<1 for those entities in B where we want to increase their score.
In some implementations, the current subject matter may be configured to incorporate investigator (e.g., user of device 102c shown in
At 916, the system 100, such as, one or more user 102c and/or engine 104, may be configured to review any alerts (e.g., via user interface 103) that have been generated and/or transactions associated with the entities that have been placed into queues (at 904 and 906). If no further actions are warranted on some such alerts/transactions, any further investigation may be closed, at 918. Otherwise, if “interesting” activities are determined, the engine 104 and/or user 102c may escalate the alert/transaction (and/or account associated with the alert/transaction) further, at 920.
As a result, the system 100 may be configured to perform a secondary review of such alerts/transactions that have been identified, at 922. If no further actions are warranted on some such alerts/transactions, any further investigation may be closed, at 924. Otherwise, the system 100 may confirm that these alerts/transactions warrant further review, at 926, and additional investigation of details of such alerts/transactions may be necessary to obtain a resolution. In particular, one or more requests for more entities that similar to the “interesting” entities may be triggered, at 928, and the processing may return to 902.
Moreover, once the system 100 determines that no further action is necessary, at 918 and/or at 924, the system may be configured to trigger further requests. For example, a request for less entities that are similar to the currently evaluated entity may be triggered, at 930. Alternatively, or in addition to, a request for more diverse entities to the currently evaluated entity may be triggered, at 932.
In some implementations, over time, the system 100 may learn from that feedback and provide a set of enriched entities that may be close in distance to what the investigators have found important in the past and may be looking for, which may be expressed as follows:
Distance(xi, xj)=Lp(xi, xj)<−2σ (7)
One way the system 100 may accomplish that is by tracking all (and/or a subset) of the requested entities and assigning weights (at 902) based on the number of times a certain entity is close/further from a requested customer. An entity i that appears as a result of multiple users' k requests may be assigned a higher weight (and so prioritized for investigation), and conversely those that appear multiple times in the “less category” may be assigned a lower weight. The weight of each request may decay over time to not bias the requests towards older entities that constantly trigger rules. The following expression may be used to determine an entity weight:
where TriggerRequestMore is the number of times the entity i is requested as being close in Distance(xi, xj) to a customer j of interest under investigation; TriggerRequestLess is the number of times the entity i is requested as being close in Distance(xi, xj) to an uninteresting customer j under investigation; and dt is the number of days since investigator k requested more/less of an entity.
The EntityWeight based on investigator feedback may then be used to scale the unsupervised OutlierScore, using the following:
WeightedOutlierScorei=OutlierScorei*(1+α*EntityWeighti) (9)
where α a is a scaling factor to account for operational constraints and workload.
Once determined, an investigator user may customize the amount, frequency, and type of entity they want to prioritize. This may also be performed in conjunction with the score where the weights are used to adjust scores. New entities (e.g., un-alerted entities (at 908-910)), that have a positive weight from requests and that otherwise would not have crossed the required threshold, may be alerted. Entities that scored high (e.g., high-scoring entities), that have a negative weight from requests, may see their score drop below a minimum alert threshold. Entities in the former may then be added to the OutlierScore Queue (at 904) to be worked by an investigator, and entities in the latter may be moved back into a non-alerting node.
During the phase 1202, an un-alerted customer, at 1201, may receive a positive weight based on similar customers being requested, at 1209 during phase 1204, which in turn, may increase their OutlierScore, at 1211. If the new WeightedOutlierScore is greater than a minimum alert threshold, at 1213, the customer may be moved to the OutlierScore Alert Queue, at 1215, during phase 1206.
Conversely, a high scoring customer, at 1203, that gets negative weights based on similar customers being denoted TriggerRequestLess by investigators, at 1205 during phase 1204, may see their OutlierScore reduced, at 1207. Thus, if the new WeightedOutlierScore is below the threshold, at 1213, this customer may be moved to a Closed No Action queue, at 1217, during phase 1206.
In some implementations, for immediate individual investigator requests, the system 100 may be used by an individual investigator during a review of a specific entity to immediately view other entities that are close in distance using the pairwise distance metric discussed above. The investigator may then review those entities and select to escalate accordingly, in a similar fashion to a review of entity networks. Here, the review may be focused on the type of interesting activity found for that initial customer, rather than a full entity review. In cases, where the entity requested has already triggered an existing alert, the investigator may close it with interest.
In some implementations, the system 100 may include a module to track and/or supervise performance of the investigator(s). The supervisor role of the system is presented with which investigators may be requesting interesting cases and weigh those requests accordingly and/or use it for training purposes the help improve the overall process. As the system 100 tracks performance of all requests for more, and/or less similar cases, it may then look at the likelihood that other investigators will find interesting cases in those requests and compare them against the likelihood that the initial analyst finds the activity interesting. Differences of more than a particular statistical metric may then be flagged and sent to a specific team that may compare the decisions made by an analyst against current guidelines to help with coaching or update existing guidelines if the activity warrants it. This approach differs from a traditional performance evaluation that currently exists in most institutions, where an investigator is evaluated based on the actual customers being investigated. The novel assessment being presented here looks at how the requested entities, that are near the investigated entity, are dispositioned by other investigators. Each investigator may be assigned an InvestigatorConsistencyScore based on the outcome of the recommended entities. If the score falls below a certain threshold, the investigator may be flagged for review.
The system 100 may also determine InvestigatorInterestingPercentage as a ratio of interesting entities found, over the total entities recommended, weighted by the number of times the entity has been recommended by other investigators as well, as follows:
where, Customerk=Un-Alerted customer, recommended for investigation, that is close in distance to original customer under review; Weightk=Number of times the customer has also been requested by other investigators when reviewing other customers; InterestingFlag =Flag disposition for when a customer has interesting activity; and dt,k=Number of days t since investigator k requested more of the customer.
This score may decay over time so as to not bias the score towards older dispositions. That percentage may then be normalized and bounded to generate a final InvestigatorConsistencyScore as follows:
where, MaxScore is the upper bound for the highest possible score achievable by any investigator; α and β are predefined shift and scale parameters to adjust the distribution of scores to fit requirements.
As shown in
Results for investigator 2 (in table 1504) also show 5 of the 8 customers were not interesting. However, the score here is considerably lower since those non-interesting were on average weighted higher. 2 of the 8 non interesting were not recommended by any other investigator, while the 3 interesting were recommended elsewhere and thus were weighted less heavily. Investigator 2 scores low (301) because they are highlighting customers that others are not, and that are not interesting. Given the low score, investigator 2 should be reviewed to determine why they are highlighting those customers, and provided with the appropriate training.
Thus, in some implementations, the current subject matter may be configured to provide one or more of the following advantages and/or features. It may serve as an effective training time method to score entities for outliers and rare events, including an ability to enhance the diversity of outliers found by unsupervised machine learning. It may also function as a run-time system where these scores may be presented to multiple investigative users, and their feedback is included in the system such that more novel and interesting types of rare entity behavior is found over time. Once sufficient numbers of examples of a new behavior are found, they can be including in the training set of a supervised machine learning model, to enhance the efficiency of detecting these new behaviors. Additionally, the current subject matter may provide a way for evaluating consistency of each investigator's contributions to the feedback process, so that the process of discovering new entity behavior can be well-governed.
In some implementations, the current subject matter may be configured to be implemented in a system 1600, as shown in
At 1702, the engine 104 may process a time-series data record received from a plurality of time-series data sources. The time-series data record may represent one or more actions executed by an entity in a plurality of entities and stored by at least one time-series data store in the plurality of time-series data stores.
At 1704, the engine 104 may generate a data structure (e.g., an entity profile) corresponding to the entity. The generated data structure may identify the entity and include one or more representations of processed time-series data (e.g., historical behavior) identifying one or more types of observed behavior or actions executed by the entity. These behaviors and actions may include, for example, opening an account, transferring funds, the temperature of a motor, etc.
At 1706, the engine 104 may detect a current action, behavior and/or state of the entity and receive one or more current time-series data that corresponds to the current action and associated with the data structure corresponding to the entity. The engine 104 may be configured to detect outliers in the current event, behavior and/or state.
At 1708, one or more first features may be extracted by the engine 104 from the generated data structure based on one or more current time-series data. In particular, the engine 104 may perform feature extraction from the entity profile and current input data.
At 1710, the engine 104 may compare one or more extracted first features and one or more second features extracted for at least another entity in the plurality of entities. The engine 104 may then determine, based on the comparison, one or more difference parameters being indicative of differences between the selected one or more first and second features. In particular, the engine 104 may determine distances and/or diversity of entities, as discussed above.
At 1712, the engine 104 may perform training of one or more models, using the difference parameters, where selection of over- or under-representation of training exemplars may be performed. These may refer to a representation weight of the training exemplar in the model training (each record may be assumed equal weight but this parameter allows for it to be more or less important in its contribution to the final training). Then, the engine may determine, using the trained models, a score for each of the data records received from the entity. Thus, one or more outlier actions, behaviors and/or states may be determined.
At 1714, at least one action (in the one or more actions executed by the entity) may be identified by the engine 104 based on the determined scores. Such actions may be determined to be questionable (e.g., fraudulent, etc.).
At 1716, the engine 104 may update the training of one or more models in response to receiving a feedback data responsive to the identified at least one action, and identify at least another action.
In some implementations, the current subject matter can include one or more of the following optional features. At least one of the first features and the second features may include one or more latent features. The training of the models may be performed using the selected first and second features.
In some implementations, the training may include selecting at least one over- and under-representation of a training exemplar or no change to representation.
In some implementations, the feedback data may include feedback data responsive to a utility of the identified at least one action.
In some implementations, the processing may include monitoring the actions executed by the entity, and receiving the time-series data from the plurality of time-series data sources. The actions, behaviors and/or state of the entity may be summarized by one or more representations and may include at least one previously executed action (e.g., historical actions by the entity).
In some implementations, the time-series data may be received during at least one of the following time periods: one or more periodic time intervals, one or more irregular time intervals, and any combination thereof. The time-series data may represents one or more actions executed by the entity during a predetermined period of time.
In some implementations, at least one entity and at least another entity may include at least one of the following: related entities, unrelated entities, and any combination thereof.
In some implementations, one or more difference parameters of the representations may include at least one of the following: latent parameters determined for least comparable entities, parameters determined for most comparable entities, and any combination thereof. This may include a diversity metric for least/most likely entities.
In some implementations, at least another identified action may include at least one of the following: an action identified in addition to the at least one identified action, an action identified for replacing at least one identified action, no action, and any combination thereof (e.g., feedback requests for actions “TriggerRequestMore”, “TriggerRequestLess”, etc.).
In some implementations, the updating may include assigning one or more weight parameters to at least one of: at least one entity and one or more actions executed by the entity, and generating an updated model and an updated score for each of the actions executed by the entity based on the weight parameters. The weight parameters may be determined based on at least the received feedback data. In some implementations, the received feedback data may include one or more labels associated with at least one of: at least one entity and one or more actions executed by the at least one entity. The weight parameters may be determined based on a number of times the feedback data is received for at least one of: the entity and at least another entity being similar to the entity and determined to be within a predetermined distance of the entity. The received feedback data may include feedback data associated with at least another entity being similar to the entity. The received feedback data may include an aggregate feedback data associated with at least one entity and at least another entity being similar to the entity. The feedback data may include a feedback data associated with one or more actions executed by at least one of: at least one entity and at least another entity being similar to the entity. One or more actions may include at least one of the following: at least one identified action, an action identified for replacing the identified action, no action, and any combination thereof.
In some implementations, the method may include generating a consistency score one or more of the investigative users of the system, the consistency score being determined based on receiving a number of times a similar feedback data for at least one of: at least one entity, at least another entity being similar to the entity and determined to be within a predetermined distance of the entity, and one or more actions executed by at least one of: at least one entity and at least another entity being similar to the entity, and any combination thereof, and determining, based on the generated consistency score, whether to use the received feedback data in the updating.
In some implementations, the method may include repeating at least one of the processing, the generating, the detecting, the extracting, the comparing, the training, the identifying, and the updating based on the received feedback data.
The systems and methods disclosed herein can be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Moreover, the above-noted features and other aspects and principles of the present disclosed implementations can be implemented in various environments. Such environments and related applications can be specially constructed for performing the various processes and operations according to the disclosed implementations or they can include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and can be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines can be used with programs written in accordance with teachings of the disclosed implementations, or it can be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.
The systems and methods disclosed herein can be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Although ordinal numbers such as first, second, and the like can, in some situations, relate to an order; as used in this document ordinal numbers do not necessarily imply an order. For example, ordinal numbers can be merely used to distinguish one item from another. For example, to distinguish a first event from a second event, but need not imply any chronological ordering or a fixed reference system (such that a first event in one paragraph of the description can be different from a first event in another paragraph of the description).
The foregoing description is intended to illustrate but not to limit the scope of the invention, which is defined by the scope of the appended claims. Other implementations are within the scope of the following claims.
These computer programs, which can also be referred to programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
To provide for interaction with a user, the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including, but not limited to, acoustic, speech, or tactile input.
The subject matter described herein can be implemented in a computing system that includes a back-end component, such as for example one or more data servers, or that includes a middleware component, such as for example one or more application servers, or that includes a front-end component, such as for example one or more client computers having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described herein, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, such as for example a communication network. Examples of communication networks include, but are not limited to, a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
The computing system can include clients and servers. A client and server are generally, but not exclusively, remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and sub-combinations of the disclosed features and/or combinations and sub-combinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations can be within the scope of the following claims.