1. Field of the Invention
The present invention relates to computer systems and methods for detecting new uses of legitimate business flows of websites. It is important for websites to understand the new ways users are using their sites since this can help identify both new legitimate and malicious uses of a website.
2. Background Information
In 2005, 75% of all fraud perpetrated through the internet was initiated through websites and only 25% of online fraud was initiated through email. Because of the success of technologies like firewalls, intrusion prevention systems, and web application security, bad actors are finding more sophisticated ways to steal money and victimize internet users and the owners of websites.
There are many ways criminals can use websites to victimize users or the owners of the websites. Some of these fraud types include stealing money using stolen passwords, selling merchandise that will not be delivered, paying for merchandise with illicit funds (either stolen funds or through fraudulent payment mechanisms like fake cashier's checks), false offers of money (also known as Nigerian scams), soliciting accomplices to do things like receive illicit funds or illicit goods and pass them along to the scammer, spam users with nuisance messages, deliver email or other messages that contain malicious code, etc.
In the past, many of these fraud types were perpetrated by trying to “break in” to the systems or intranets of the targeted companies. By finding holes in VPNs (Virtual Private Networks), firewalls, or databases, fraudsters could steal money or credentials to perpetrate their fraud. Because intrusion protection products have become much more powerful, fraudsters have had to find other ways to make their profits. The next step in the progression was to find bugs in a website's code and use those bugs to perform the illicit activity. Web application security vendors now check website code to find code vulnerabilities that allow fraudsters access to sensitive information so that these vulnerabilities can be addressed.
Because web application security finds the code vulnerabilities on websites, fraudsters have turned to an even more sophisticated methodology for exploiting websites and the users of those websites. Business logic abuse is defined as the abuse of legitimate pages of a website to perpetrate fraud and other illicit behaviors. A simple example of business logic abuse is guessing passwords to steal accounts on websites. By testing passwords on the signin page of a website, the fraudster is using a legitimate website business flow—the signin function—to perpetrate bad activity. Other examples of malicious use of websites through legitimate business flows include the mass registration of accounts (for example to send spam on social network sites or to game incentive programs on financial institution or e-commerce sites), scraping of email addresses and personal information off of social network sites, scraping of financial and personal information off of financial institution websites.
New website behaviors are not always fraudulent. There are cases where website owners want to change the behaviors of users on their site. An example is a website that launches a new feature—that website wants its users to take advantage of the new feature, thereby changing the way the users use the website. Another example is when a particular feature of a website becomes popular because of news coverage. Website owners want to know when new behaviors are occurring on their websites so they can track adoption of features, understand the usage of their site, or determine fraudulent events on their site.
A behavior change detection system is configured to detect changing user behaviors on a website by mapping website session information into numerical vectors and using the vector spaces associated with those vectors to track the changes in website session behaviors. The distance between a vector for a particular session, user, etc. and the exemplar of a normal session, user, etc. are compared to determine how close the actions of the current session, user, etc. is to expected behavior. As the distance from the exemplar vector increases, the likelihood the behavior is a new behavior also increases. As thresholds are met that indicate a session vector deviates enough from the exemplar to indicate new behavior, appropriate actions can be taken to better understand and respond to that behavior.
In one aspect, historical vectors are used to determine the exemplar session vectors for a website. All or a subset of historical vectors can be used.
In another aspect, the distance between a session vector and the exemplar vector is taken into account to determine the likelihood of the current session representing a new behavior for the website.
In accordance with a further aspect, a method for determining a likelihood of a previously unknown use of a website associated with using a computer system that processes data from a website session into a plurality of parameters configured to represent the website session information, and wherein the parameters are combined into a vector in a vector space, the method comprises: mapping the vector into various vector spaces; comparing the vector with other vectors based on the distance between the vector and the other vectors in the various vector spaces; evaluating the vector using a comparison between the other vectors in the same or similar vector spaces; generating a score indicative of the similarity between the vector and the other vectors in the same or similar vector spaces; and returning the score to an investigation system for analysis.
In accordance with another aspect, a method for determining a likelihood of a previously unknown use of a website associated with a website session, comprises: receiving a plurality of parameters associated with an action performed during a website session; creating a session vector that has a dimension corresponding to each of the plurality of parameters associated with the action performed during the website session; creating an exemplar session vector based on other session vectors within a vector space; and comparing the session vector to the exemplar session vector in the various vector spaces.
In accordance with a further aspect, a method of mapping website session data into a vector space comprises: parsing website session data into a plurality of parameters; and mapping the plurality of parameters into n-dimensional vectors, wherein n is a number of parameters available about an action on a website, and wherein each vector is mapped into an n-dimensional space associated with the plurality of parameters related to the action on the website.
In accordance with another aspect, a behavior change detection system comprises: a website data center, which receives a plurality of input parameters associated with website actions; and a behavior change detection center configured to detect behavior changes by users of a website based on: receiving a plurality of input parameters associated with website actions performed during a website session; creating a session vector that has a dimension corresponding to each of the plurality of input parameters associated with the website actions performed during the website session; creating an exemplar session vector based on other session vectors within a vector space; and comparing the session vector to the exemplar session vector in the various vector spaces.
In accordance with a further aspect, a computer readable medium containing a computer program for determining a likelihood of a previously unknown use of a website associated with a website session, wherein the computer program comprises executable instructions for: receiving a plurality of parameters associated with an action performed during a website session; creating a session vector that has a dimension corresponding to each of the plurality of parameters associated with the action performed during the website session; creating a exemplar session vector based on other session vectors within a vector space; and comparing the session vector to a exemplar session vector in the various vector spaces.
These and other features, aspects, and embodiments of the invention are described below in the section entitled “Detailed Description.”
For a better understanding of the nature of the features of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
The present invention is directed to a system and method for determining when user behavior on a website changes. In an exemplary embodiment of the invention, website behavior change is detected using feature vectors mapped into vector spaces and compared with other vectors in those spaces to determine anomalous behavior versus typical behavior. Mapping website behavior into vector spaces provides a generalized methodology for building a multi-dimensional representation of user actions on a website. This generalized methodology allows the comparison of the current user, page view, or action on a website with what is known as a exemplar user, page view, or action on a website. By comparing the distance in a vector space between the known typical behavior and the current behavior, decisions can be made as to whether the current behavior deviates in a meaningful way from typical behavior. In the case the current behavior deviates in a meaningful way from typical behavior, alerts can be issued to the appropriate parties. These techniques have proven to be efficient and effective even though the number of possible useful features of given vector spaces will generally be large.
The inventive system operates upon an incoming stream of input data generated by actions on a website. Example actions on a website generally correspond to clicks by the user of the website. These clicks can be done by a human or by an automated computer program. Automated computer programs can work by simulating website clicks or by working through the application programming interface of the website.
Examples of actions taken on websites include clicks to go to other pages of the websites and entering data into forms on the website. Examples of entering data into forms on a website include entering a user name and password on a website to sign-in to the website, filling out an email form to send email to another user of the website, or entering personal information to register for an account on the website.
As described in further detail below, each website action consists of multiple parameters as defined by any information corresponding to the action on the website that can be seen by the processors and computers related to a web server, a firewall, or other device that processes website traffic and additional information provided by the website or third parties. Examples of parameters associated with website actions include IP addresses, including those of any proxies used in the process of sending traffic to the website, browser header information, operating system information, information about other programs installed on the user's machine, information about the clock and other settings on the user's machine, cookies, referring URLs, usernames, parameters associated with a post to the website, and any other information associated with the user's action on the website. Examples of information provided by the website include the length of time the username has been registered, account numbers associated with the username, account balances associated with the username, previous actions performed by the cookie, etc. Examples of data provided by third parties include fraud probabilities associated with internet protocol addresses, geo-location information associated with internet protocol addresses, frequency scores associated with passwords, etc. Any other information that can be seen by the web server, firewall, etc. can be used in this model to map the current action into the vector space.
As each new action on the website occurs, the parameters associated with that action are mapped into several vector spaces. Examples of typical vector spaces include a vector space associated with a user, a vector space associated with a particular page, a vector space associated with a particular referring URL, etc.
Mapping the parameters associated with an action on a website into vector form means creating a vector that has a dimension corresponding to each of the parameters associated with an action on the website. As an action is processed, the web server, firewall, or other transaction processing device receives the information about the action on the website. The inventive system takes the information associated with the action on the website, parses out the specific data associated with each parameter of the action, creates a numerical representative of that data element, and puts that representative of the data element into its corresponding position in the associated vector. The representatives of the data elements are numerical values. In the case a parameter associated with an action is not a numerical value, that parameter is mapped to a numerical value using a hash function or lookup table.
As new actions are fed into the system, the vectors corresponding to those actions are updated with the new parameters associated with that action. For example, when looking at a particular website user, as specified by a userID, cookie, or other values, a sequence of actions on a website are called a user's session. In accordance with an exemplary embodiment, the present invention looks at all of the actions in a particular session to determine if the current session is similar or different to the other sessions on the website, other sessions that use a particular website page, etc. In real-time, or in a batch processing mode that operates on timed increments, for example once an hour, the vectors for each action are computed. In addition, an exemplar vector for users, each page on the website, each referring URL, etc. are created. This exemplar vector could be made up of the average actions by a user or for a page or could be derived using other methodologies to determine an exemplar vector. This exemplar vector may take into account all users, actions, pages, etc. or may only consider a subset of those entities.
To determine new website behavior, a score is computed by comparing the distance between each individual vector and the exemplar vector in the corresponding vector space. If the generated score indicates the individual vector deviates from the exemplar vector in a meaningful way, the appropriate action is taken. Some appropriate actions to take include sending alerts to various website fraud detection systems, sending emails to interested parties, etc.
Turning now to
Referring to
Turning now to
Moving on to
In an exemplary embodiment, a computer program which implements all or parts of the processing described herein through the use of a system and/or methodology as illustrated in
It will be understood that the foregoing description is of the preferred embodiments, and is, therefore, merely representative of the article and methods of manufacturing the same. It can be appreciated that many variations and modifications of the different embodiments in light of the above teachings will be readily apparent to those skilled in the art. Accordingly, the exemplary embodiments, as well as alternative embodiments, may be made without departing from the spirit and scope of the articles and methods as set forth in the attached claims.