The present disclosure generally relates to fraudulent scheme detection using communication devices, and more specifically, to fraudulent scheme detection using time-evolving graphs on communication devices.
Nowadays with the evolution and proliferation of electronics, digital transactions are becoming a common place. In some instances, the transactions may be driven by a common actor or organization where current members of the organization are compensated based in part on an action taken by a new member of the organization. For example, consider current member compensation, the current members will receive a payment or return using capital obtain from a new investor to the organization. Thus, the organization does not use profit earn for paying the current member and a misperception is introduced where current members are led to believe that sale of a product produced a profit which enabled the compensation payment. Such payment and type of investment may lead to high rates of return however, such investment is also often susceptible to quick collapse as recruiting new members or investors comes difficult.
This type of investment scheme can affect not only the members of the organization, but also have an effect the payment providers through loss of monetary funds. Therefore, payment providers often perform manual research to identify risky accounts, calculate statistical metrics, and/or take static snapshots of metrics regarding potential accounts. However, the use of such detection mechanisms may be slow, lack identification processes, and are not in real-time. Thus, it would be beneficial to create a system that can detect these types of schemes.
Embodiments of the present disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, whereas showings therein are for purposes of illustrating embodiments of the present disclosure and not for purposes of limiting the same.
In the following description, specific details are set forth describing some embodiments consistent with the present disclosure. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
Aspects of the present disclosure involve systems, methods, devices, and the like for fraudulent scheme detection. In one embodiment, a time-evolving graph-based solution is presented for the Ponzi scheme detection. For the solution, unbounded and time-based relational data is transformed to the time-evolving graph structure. Time-based aggregate metrics are computed and captured based in part on changes occurring within user accounts and transactions identified within the time-evolving graph structure. Then, with the aid of know pattern flows and the application of filtering rules, detection of such a fraudulent scheme may be accomplished.
Ponzi schemes are investment techniques often used to lure members to invest in an organization and then use the funds invested to pay current members. The scheme may be considered a fraudulent operation where a member may be led to believe that a product or service sold by the organization generated the profit to pay the current member, where in reality, the member is paid using the newly invested funds obtained from a new member or investor. Although such investment scheme offers high rates of returns, such schemes tend to collapse when it becomes difficult to recruit new members or when a large number of members ask to cash out.
To illustrate this, consider
Such collapse in this scheme can be problematic not only for investors and the organization, but also for the payment provider. The payment provider's reputation and goodwill may be hurt, and monetary funds may be lost as a result of a collapse in such scheme. Therefore, it is beneficial to identify and use strategies that will aid in the identification of such schemes. Traditionally, payment providers and other involved parties, entities or investigators have performed such identification manually where risky accounts are flagged. Additionally, statistical methods have been used for such identification. However, these statistical methods often target a single account or are static in time.
In one embodiment, a system and method are introduced that aid in the identification of such types of schemes through the generation of a technique that not only evolves based on changes in the account, but also maintains a time-based metric which can be used to capture trends and detect potential Ponzi scheme accounts.
Generally, this type of solution entails the generation of time-based flow graphs constructed from transactions occurring between one or more users and/or accounts over a given time period. Then, with the understanding that Ponzi schemes have a flow pattern, extracting from the time-based flow graphs, subgraphs which are characterized as those corresponding to a Ponzi scheme and finalizing with the definition of time-based metrics which can be analyzed to identify if a Ponzi scheme is occurring.
Turning to
Next, because the method involved in detecting a Ponzi scheme includes the generation of a time-evolving graph, a graph application program interface (API) layer is used and present in the system architecture 200 of
Further to the Graph API layer, the time-evolving graph generating, Ponzi scheme detection system architecture 200 can also include a graph computation layer 206. The graph computation layer 206, is a layer in the system architecture 200 that can be used for the generation of the graph, based on data extracted from transactions. For example, at the graph computation layer 206, graph vertices can be used to represent users and edges represent relationships between users. Alternatively, the graph computation layer can create a graph where the edges represent a transaction, and the vertices represent accounts. To generate such graphs, propriety and/or open sources programs may be used (e.g., GraphFrame). In one embodiment, an open source program can be used to provide DataFrame based APIs which may be used in the implementation and graph implementation. For example, an open source general-purpose cluster computing framework may be used for data retrieval and use a dataset and interface for the organization and creation of the graphs. The data retrieval may be achieved through the use of a storage layer 208. The storage layer 208 can include the storage of the data including the transactions, accounts, and users associated with the time-evolving graphs to be generated for this Ponzi scheme detection solution.
Note that system architecture 200 is presented for exemplary purposes and one or more layers may be added and/or removed. Additionally, the storage layer 208 and graph computation layer 206 are not limited to the use of a data storage or dataset, alternative solutions may be used including but not limited to GraphFrame, Hive, Teradata, Neo4j, OrientDB. Further, additional graph-based APIs can be added to the graph API layer 204. Still further, the architecture system can be extended for other fraud detection schemes such as but not limited to the pyramid scheme, MLM, etc.
In an effort to define Ponzi scheme detection in more detail, the graph generation, subgraph extraction, metric determination, and detection are described below and in conjunction with
Using the same criteria, the relationship between more users, business, individuals, accounts etc. can be graphed as was the first account relationship. Therefore, turning to
Note that based on the graphs described above and in conjunction with
Turning to
Note that although a Ponzi scheme subgraph is illustrated here, other relational schemes may be detected using a similar graphing mechanism. Also note that although a follower relationship is used here for Ponzi scheme detection, other relationships may exist and used for detection and the graph/subgraphs 400/450 presented herein is for exemplary purposes as other graphs and/or subgraphs may be possible.
Next, as indicated above, a timing window may be used to identify those accounts or payment relationships from which the subgraphs are generated. The timing window enables the identification and limitation of those transactions occurring over the time span provided. Therefore, in some embodiments, a sliding time window may be used for varying time intervals. To illustrate how a sliding time window may be used for the generation of time-evolving payment graphs, consider
At
Recall that as previously indicated, the vertexes of the graph may be used to refer to accounts whereas the edges represent a payment between the accounts. Thus, during the first hoping time slot, it could be generally stated that for the time period and/or accounts under consideration, approximately six accounts interacted with three of those accounts involved in a transaction associated with a Ponzi scheme. Next, turning to the next and adjacent hopping time slot 506B, a new set of transactions may have occurred as indicated here on the graph 514 by the solid lines.
Graphs then become time-evolving and at graph 516, hopping time slot 504C, of the year-long time period 506, the transactions occurring between July-September are now considered. Again, because the time-evolving graph 500 is indeed accounting for prior and new account transactions, dashed lines are used for previous transactions while new ones are denoted by the solid lines. At graph 518, the sliding time window 502 has been fully accounted for and the graph 518 now represents a time-evolving graph consisting of those transactions between two or more accounts.
Note, in some instances, the hopping time slots may be uneven, dynamic, non-adjacent, tunable, etc. based on the scheme and analysis to be determined. In addition, the time period 506 may be longer or shorter than that illustrated as well as the sliding time window 502. For example, the time under consideration may be during a holiday season, in the summer time, or dynamic based on transaction loads. Also note that a larger size sliding window may represent a longer period of payments, while a smaller sliding window may indicate smaller sized hopping time slots will be considered in the analysis.
Now, in consideration of the time-evolving graph, each graph 508 may be considered individually to obtain the desired metrics. Intuitively, those accounts which illustrate a quick expansion with bi-directional activity between two or more accounts may designate the potential for a possible Ponzi scheme. To determine if indeed there exist the potential for a Ponzi scheme, metrics may be computed for each graph 508 in order to obtain an aggregate network-based metrics whose attributes provide an indication of the existence of the Ponzi scheme.
In one embodiment, for each account or vertex, different kinds of time-based graph metric vectors may be computed. For example, for an N-dimensional vector
<x1,x2, . . . xn>
may be defined where N is the number of hopping time slots 506 within the sliding time window 502. The value (e.g., xi) within the N-dimensional metrics can then be one of the aggregate metrics that could be calculated based on the account (vertex) and payment (edge) that falls in the ith hopping time slot. Then, using the values computed or determined on the vector, various metrics may be computed.
Turning back for
Note that other types of metrics may be determined and identified in a similar manner. For example, instead consider a transactional metrics and/or network metrics. For transactional metrics, a net transaction amount and total transactional count could also be determined. Transactional metrics could be metrics that are determined on a hopping time slot, on an account, and related to outgoing edges to determine number of payments sent or alternatively payments received from incoming edges. This is similar to the metrics 510 considered above and described in conjunction with
where the growth rate is a function of the N-dimensional vector previously determined.
As indicated, other metrics are also possible including network metrics or follower metrics which can help highlight a property associated with a Ponzi scheme. Generally, a Ponzi scheme account needs to attract new investors in order to use their investment funds as returns to existing members or investors of the organization. Hence, the number of new followers and how fast an account expands its followers are important properties can be used to flag the existence of a possible Ponzi scheme. Therefore, network metric vectors may be determined and used to determine how fast an account is expanding its payment network. In other words, how many and how quickly is an account gaining followers. To detect the number of new followers for one account, the number of new one-hop neighbors in each hopping time slot may be determined (see
Like transactional metrics, network metrics may also be used to obtain aggregate metrics. For example, aggregate metrics of followers can be determined and used to show how different a potential Ponzi account is from other followers. Thus, if one Ponzi account's net transaction amount grows quickly, while the follower's net transaction amount decreases, an indication of a possible risk may be risen here. With additional account information available, additional metrics may also be determined and used for risk detection. User IP login, location, region, etc., may also be available and used for computing additional metrics such as a follower's geographical location distribution. In some Ponzi schemes, accounts exhibit a large number broad geographical transactions, thus is a follower's geographical location distribution is known, a Ponzi scheme may be detected. Then, expanding this a step further, advanced algorithms such as but not limited to the label propagation algorithm may be applied for detecting communities of payment networks.
Once all metrics are known, Ponzi scheme detection or other high-risk account detection may be accomplished using a set of filters which may be put in place that will distinguish the accounts that should be further scrutinized or investigated. For exemplary purposes, some filters that may be used for a Ponzi scheme central account detection include: 1) contains many bi-directional transactions with many followers, 2) net transaction amount and total transaction count grows quickly, 3) has large net transaction amount and transaction count and those amounts are much greater than its 1-hop followers, etc.
Note that further to the listed filters above, other filters and criteria may be added for Ponzi scheme or another risky scheme detection. Further, specific values regarding how much a specific metric should be may be set or dynamically defined. For example, a 12-month time period may be considered, a growth rate >1.1 may be considered, top 1% network size considered, etc. Additional filters may be set and used to fine to the detection of a fraudulent scheme, master account in a Ponzi scheme, etc.
To illustrate how the time-evolving graph may be used for Ponzi scheme detection,
Process 600 may begin with operation 602, where a request is received to detect if a fraudulent account exists in a plurality of accounts. The request may be received and initiated by a machine or device working through a platform and dashboard where details regarding the time-period, metrics desired and filters to apply for detection. Additionally, other parameters may be set including the type of detection scheme desired, the user of interest, geo location, etc. As previously indicated, account information, transactional data, users, payment information, etc. may be collected and available for retrieval from data storage 208 as illustrated in conjunction with
Once the request and data is retrieved, process 600 continues to operation 604 where the time-evolving graph process is initiated. To generate the time-evolving graph, a timing window and corresponding hopping time slots. As previously indicated, a timing window can be a time period under consideration that may be used to identify those accounts or payment relationships that are targeted for investigation and Ponzi scheme detection. The hopping time slots are therefore the smaller windows created from which the subgraphs are generated and from which metrics may be obtained. In particular, as previously indicated, the hopping time slots can be adjacent, even-sized, non-overlapping hopping windows. The hopping time slots can be used to further define a subset of payments which may occur within the sliding time window and fall within the designated hopping time slot and for which a graph-based computation will be performed.
As the designated time-period and intervals (e.g., through timing window and hopping time slots) are designated, the process 600 continues to operation 606 where graphs illustrating the transactions occurring between two or more accounts (designated, random, or generally accounts that transact during the sliding time window). The graph can be generated by assigning account names, users, businesses, entities, etc. with a vertex and using an edge to show the association, much like that described in greater detail above and in conjunction with
Identifying vector metrics at operation 610 can include a series of analysis wherein transactional metrics, network metrics, and aggregate metrics may be determined based on flow patterns and communications between accounts. As described above, vector metrics can include metrics that are determined on a hopping time slot, on an account, and related to outgoing edges to determine number of payments sent or alternatively payments received from incoming edges. Additionally, vector metrics can include vectors that may be used to determine how fast an account is expanding its payment network. In other words, how many and how quickly the account is gaining followers. Aggregate metrics can include growth rates and can be indicators of centralized accounts based in part on how fast an account is expanding its payment network. In other words, how many and how quickly is an account gaining followers. To detect the number of new followers for one account, the number of new one-hop neighbors in each hopping time slot may be determined.
Next, at operation 612, once vectors have been determined a decision as to whether filtering should be added is considered. For example, if the purpose of the analysis is to determine whether the account under consideration is a centralized account, then some filters may be added and used to detect whether some pre-defined parameters are met. Exemplary parameters can include whether the account contains many bi-directional transactions with many followers, net transaction amount and total transaction count grows quickly, and growth rate >1.1. Other filter parameters and threshold values may be set, tuned, and may vary depending on the type of scheme and detection being considered. If filtering is indeed used, then operation 612 continues to operation 616 where those accounts and transactions that do not fit the criteria are removed from consideration. Then, given the remaining information, a decision may be made and account flagged if the scheme is determined to be a centralized account. Alternatively, if no filter is needed then the process 600 may continue to operation 614 where if no accounts remain or are of suspicion, then at operation 612 no fraudulent account is detected.
Note that in some instances, even without filtering, the account may be flagged as suspicious if the metric determined at operation 610 are sufficient to detect one. Additionally, after filtering out criteria 616, there may be some instances where no accounts are flagged and thus operation 616 instead terminates at operation 614. Also note that the order and number of operations listed are only for exemplary purposes and more or less operations may be possible. The order of the operations may also be updated and combined. For example, determining the sliding time window may be a distinct operation from determining the corresponding sliding time slots. As another example, there may be no need to determine if a subgraph is needed and instead a subgraph extraction occurs as a sequential operation as oppose to a decision. Other similar examples and arrangement of operations may be contemplated.
Additionally, as more and more devices become communication capable, such as new smart devices using wireless communication to report, track, message, relay information and so forth, these devices may be part of computer system 700. For example, windows, walls, and other objects may double as touch screen devices for users to interact with. Such devices may be incorporated with the systems discussed herein.
Computer system 700 may include a bus 710 or other communication mechanisms for communicating information data, signals, and information between various components of computer system 700. Components include an input/output (I/O) component 704 that processes a user action, such as selecting keys from a keypad/keyboard, selecting one or more buttons, links, actuatable elements, etc., and sending a corresponding signal to bus 710. I/O component 704 may also include an output component, such as a display 702 and a cursor control 708 (such as a keyboard, keypad, mouse, touchscreen, etc.). In some examples, I/O component 704 other devices, such as another user device, a merchant server, an email server, application service provider, web server, a payment provider server, and/or other servers via a network. In various embodiments, such as for many cellular telephone and other mobile device embodiments, this transmission may be wireless, although other transmission mediums and methods may also be suitable. A processor 718, which may be a micro-controller, digital signal processor (DSP), or other processing component, that processes these various signals, such as for display on computer system 700 or transmission to other devices over a network 726 via a communication link 724. Again, communication link 724 may be a wireless communication in some embodiments. Processor 718 may also control transmission of information, such as cookies, IP addresses, images, and/or the like to other devices.
Components of computer system 700 also include a system memory component 714 (e.g., RAM), a static storage component 714 (e.g., ROM), and/or a disk drive 716. Computer system 700 performs specific operations by processor 718 and other components by executing one or more sequences of instructions contained in system memory component 712 (e.g., for engagement level determination). Logic may be encoded in a computer readable medium, which may refer to any medium that participates in providing instructions to processor 618 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and/or transmission media. In various implementations, non-volatile media includes optical or magnetic disks, volatile media includes dynamic memory such as system memory component 712, and transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 710. In one embodiment, the logic is encoded in a non-transitory machine-readable medium. In one example, transmission media may take the form of acoustic or light waves, such as those generated during radio wave, optical, and infrared data communications.
Some common forms of computer readable media include, for example, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer is adapted to read.
Components of computer system 700 may also include a short-range communications interface 720. Short range communications interface 720, in various embodiments, may include transceiver circuitry, an antenna, and/or waveguide. Short range communications interface 720 may use one or more short-range wireless communication technologies, protocols, and/or standards (e.g., Wi-Fi, Bluetooth®, Bluetooth Low Energy (BLE), infrared, NFC, etc.).
Short range communications interface 720, in various embodiments, may be configured to detect other devices (e.g., user device, merchant device, server, laptop, smart device, etc.) with short range communications technology near computer system 700. Short range communications interface 720 may create a communication area for detecting other devices with short range communication capabilities. When other devices with short range communications capabilities are placed in the communication area of short range communications interface 720, short range communications interface 720 may detect the other devices and exchange data with the other devices. Short range communications interface 720 may receive identifier data packets from the other devices when in sufficiently close proximity. The identifier data packets may include one or more identifiers, which may be operating system registry entries, cookies associated with an application, identifiers associated with hardware of the other device, and/or various other appropriate identifiers.
In some embodiments, short range communications interface 720 may identify a local area network using a short-range communications protocol, such as Wi-Fi, and join the local area network. In some examples, computer system 700 may discover and/or communicate with other devices that are a part of the local area network using short range communications interface 720. In some embodiments, short range communications interface 720 may further exchange data and information with the other devices that are communicatively coupled with short range communications interface 720.
In various embodiments of the present disclosure, execution of instruction sequences to practice the present disclosure may be performed by computer system 700. In various other embodiments of the present disclosure, a plurality of computer systems 700 coupled by communication link 724 to the network (e.g., such as a LAN, WLAN, PTSN, and/or various other wired or wireless networks, including telecommunications, mobile, and cellular phone networks) may perform instruction sequences to practice the present disclosure in coordination with one another. Modules described herein may be embodied in one or more computer readable media or be in communication with one or more processors to execute or process the techniques and algorithms described herein.
A computer system may transmit and receive messages, data, information and instructions, including one or more programs (i.e., application code) through a communication link 724 and a communication interface. Received program code may be executed by a processor as received and/or stored in a disk drive component or some other non-volatile storage component for execution.
Where applicable, various embodiments provided by the present disclosure may be implemented using hardware, software, or combinations of hardware and software. Also, where applicable, the various hardware components and/or software components set forth herein may be combined into composite components comprising software, hardware, and/or both without departing from the spirit of the present disclosure. Where applicable, the various hardware components and/or software components set forth herein may be separated into sub-components comprising software, hardware, or both without departing from the scope of the present disclosure. In addition, where applicable, it is contemplated that software components may be implemented as hardware components and vice-versa.
Software, in accordance with the present disclosure, such as program code and/or data, may be stored on one or more computer readable media. It is also contemplated that software identified herein may be implemented using one or more computers and/or computer systems, networked and/or otherwise. Where applicable, the ordering of various steps described herein may be changed, combined into composite steps, and/or separated into sub-steps to provide features described herein.
The foregoing disclosure is not intended to limit the present disclosure to the precise forms or particular fields of use disclosed. As such, it is contemplated that various alternate embodiments and/or modifications to the present disclosure, whether explicitly described or implied herein, are possible in light of the disclosure. For example, the above embodiments have focused on the user and user device, however, a customer, a merchant, a service or payment provider may otherwise presented with tailored information. Thus, “user” as used herein can also include charities, individuals, and any other entity or person receiving information. Having thus described embodiments of the present disclosure, persons of ordinary skill in the art will recognize that changes may be made in form and detail without departing from the scope of the present disclosure. Thus, the present disclosure is limited only by the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2018/122662 | 12/21/2018 | WO | 00 |