Phishing is the attempt to acquire sensitive data—such as credit card numbers, login credentials, social security numbers and the like for malicious purposes. Phishing often includes masquerading as a trustworthy entity in an electronic communication such as email or text message. Such trustworthy entities or brands may include banks (Chase, HSBC, Bank of America, BNP Paribas and the like), online payment services (PayPal, Apple Pay), email service providers (Gmail, Yahoo!, British Telecom, T-Online and the like), social networks (Facebook, LinkedIn), e-commerce websites (Amazon, Alibaba), etc.
Phishing scams typically comprise several consecutive steps. In the following example, a worse-case scenario is contemplated, in which the intended victim is induced into compromising his or her confidential login information.
1. At the outset, the phisher sets up a counterfeited website by installing a phishing kit. A phishing kit may include website development software, complete with graphics, coding, content that can be used to create convincing imitations of legitimate websites. This counterfeited website mimics a well-known legitimate website and is designed to capture the sensitive login, personal and/or financial data of its victims.
2. The phisher sends out a phishing campaign using a selected electronic communication modality (email, text message . . . etc.). The phishing message at the heart of the phishing campaign may comprise text, graphics and/or other content that is intended to fool the user into believing that the originator of the phishing message is legitimate, to induce and prompt the victim to click on a fraudulent Universal Resource Locator (URL) that leads the victim not to a legitimate website but to a look-alike, fraudulent website.
3. The victim receives the phishing message, and clicks on the fraudulent URL. The user's browser opens the fraudulent website and the victim, believing that the fraudulent website is actually legitimate, submits the requested credentials, usually login credentials or banking details. As shown in
4. When the victim submits his credentials, as suggested at 106 in
5. The fraudulent website programmatically forwards generated email 200 containing the sensitive, stolen user credential data over a computer network 210 to the phisher, typically a mailbox set up to receive the stolen information, which mailbox is commonly termed a “phishing dropbox”, as shown at 212.
It is a very common practice for phishers to collect the stolen credentials by email. The sensitive data is usually not stored on the fraudulent website to which the unsuspecting user was led because the website may be identified by the hosting company as being a fraudulent website and shutdown at any moment. As soon as the victim submits sensitive data on the fraudulent website, an email is sent to a specific email address—a phishing dropbox. The phisher then periodically fetches all emails delivered to the phishing dropbox and collects the stolen personal information. This stolen personal information may then be used to defraud both the user and the company whose website was spoofed and/or the stolen personal information may be aggregated and sold on some digital black market for use by other bad actors.
Phishing dropboxes such as shown at 212 are usually created on free webmail services such as Gmail, Yahoo! or AOL. Creating an email account on a free webmail is very easy and, importantly for thieves, does not require a proper identification to activate the account. This is a significant differentiator, as phishers need to remain anonymous, as their activity is illegal.
One method of deterring and interdicting phishers includes the detection of these phishing dropboxes. Such identification enables the identification of the victim and enables the crime to be reported to both the identified victim and to the company/brand associated with the spoofed website. The detection of these phishing dropboxes also enables the free webmail provider to be notified, to allow them to prevent phishers from continuing their use of the free webmail service in furtherance of their crimes. Indeed, detecting these phishing dropboxes allows the email host to close down the email account of the phisher, since the purpose to which the phisher's email account is to defraud users, which obviously does not respect the terms of use to which the phisher agreed when setting up his or her email account. Toward that end, one embodiment provides free webmail providers with the information they need to identify phishing dropboxes, which phishing dropboxes can then be shut down as soon as they are identified.
One embodiment comprises generating and submitting, to the free webmail providers, data structures called markers. The markers contain actionable information that enables the free webmail providers (for example) to filter their inbound Simple Mail Transfer Protocol (SMTP) traffic for the information in such markers, thereby allowing them to identify any phishing dropboxes on their service and to thereafter shut them down.
The most common phishing use case, the theft of login credentials, is detailed herein. Enabling mail providers to identify and shutdown phishing dropboxes, according to one embodiment, begins with the generation of a marker. A marker, according to one embodiment, may comprise one or more of the following features:
Detailed below are a few examples of markers for different brands/products/companies, according to one embodiment. For example, the free webmail service provider Gmail requires users to login using a Gmail address and a password. The constraints placed on Gmail login credentials include the following: the email address must contain the gmail.com domain and the password must be a sequence of at least 8 ASCII characters.
JSON (JavaScript Object Notation) is an open-standard, language-independent data format that uses human-readable text to transmit data objects consisting of attribute-value pairs. It is the most common data format used for asynchronous browser/server communication (AJAJ), largely replacing XML. A JSON-formatted example of a Gmail marker, according to one embodiment, is provided below:
According to one embodiment, a marker may comprise an identification of the brand/product/company and two (or more) elements. The brand, in this case, is Gmail and the first element of the Gmail-specific marker is a login and the second element is a password. The first element, in this example, is a made-up but properly-constructed email address; namely, angelica.gomes63718@gmail.com (which is of the type and format expected by Gmail) and the second element of the marker is a password that may be both randomly-generated and that satisfies all Gmail-mandated constraints. In this example, the value of the randomly-generated password is Xe4U89@df$r092wt5. The password itself has high entropy, in that the probability that such data exists elsewhere is very low. The combination of the made-up login angelica.gomes63718@gmail.com and the randomly-generated password Xe4U89@df$r092wt5 has even higher entropy, meaning that it is exceedingly unlikely that a legitimate user shares the same login/password pair as the made-up, fake credentials consisting of angelica.gomes63718@gmail.com and Xe4U89@df$r092wt5.
Another example is presented herewith, with respect to the e-commerce website Amazon.com. This ecommerce site requires an email address or a phone number and a password for login. A password that is acceptable to Amazon is any sequence of at least 8 characters and at most 128 characters. Allowed characters are letters, digits and the following special characters: !@#$%̂&*( )_+−=[ ]{ }|′. An example of a marker data structure suitable for Amazon, in JSON, is given below:
The marker, in this case, comprises the brand amazon and the fake, made-up random login/password pair. The login may comprise an email address (which satisfies the amazon-mandated constraint of being at least 8 characters in length and a most 128 characters, including special characters) and the password is the programmatically and randomly-generated string hx#418+jKtr0984. The combination satisfies the amazon constraints for login purposes, yet is highly unlikely to be the same as anyone's legitimate amazon login credentials.
Not all login credentials consist of an email address and a password. For example, the online banking service Société Générale requires customers to login using a customer ID and a Personal Identification Number (PIN) code. Société Générale places the following constraints on its customer ID and PIN numbers: the Customer ID must be a sequence of 8 digits and PIN code must be a sequence of 6 digits.
An example of a JSON-formatted marker suitable for Société Générale, according to one embodiment is shown below:
As shown, the marker identifies the brand “societegenerale”, and defines type/value pairs for both the customer ID and the PIN code. The JSON-formatted marker, in this manner, provides a uniform data structure for storing fake login credentials, that may thereafter be used to identify and shut down phishing dropboxes. The programmatic generation of markers and the uniformity of their structure enables renders this solution highly scalable and suitable for widespread adoption across the Enterprise.
According to one embodiment, markers may be programmatically-generated and the constituent elements thereof injected into the fraudulent websites pointed to by the URLs in phishing messages (e.g., emails or other forms of electronic messages) received by customers and users. According to one embodiment, a computer-implemented method may include obtaining the URL of at least one fraudulent website. Toward that end, one embodiment may include obtaining a list comprising a plurality of known fraudulent websites. For each fraudulent website, a brand specific, company-specific or otherwise personalized marker may be generated and the constituent elements thereof (including the fake credentials) programmatically provided to the fraudulent website, by submitting the made up user credentials stored in the marker to the username/customer ID (and the like) field and to password fields, or functionally similar input fields. Thereafter, the same markers provided to the fraudulent websites may be published. Such publication may include sending markers to, for example, the brand, the free email hosting company, the customer and optionally, others such as law enforcement. Once in possession of this information, they may identify the phishing dropboxes and/or take corrective action. For example, the free webmail provider may cancel the identified dropbox and the user may change his or her login information, now that their previous login information has been compromised.
According to one embodiment, one embodiment may include downloading a list of fraudulent websites from a third party such as http://www.isitphishing.org. For each fraudulent website, at least the following information may be provided:
Indeed, the list of fraudulent websites assumes the following: each record in the list is a fraudulent website that is identified by an URL, is associated with exactly one IP address (thanks to the DNS resolution of the URL), is associated with exactly one brand/product/company and has a status flag that indicates that the site is online.
Below is an example of a JSON-formatted data structure comprising a list of such fraudulent websites pointed to by URLs in phishing messages:
The first phishing URL listed in this data structure points to a fraudulent website that spoofs the paypal.com website and is currently online at IP address 245.67.189.13. The second phishing URL listed in this data structure is a fraudulent amazon website that is currently online at IP address 167.200.10.45.
The inclusion of the IP address is significant as, according to one embodiment, more than one marker should not be submitted to the same IP address. Indeed, submission of more than one marker to a single IP address may trigger an identification, by the phisher, of the credentials as being illegitimate and submitted by, for example, a security vendor. It is quite common that fraudulent websites try to detect robots developed by security vendors by checking the number of HTTP connections coming from the same IP address.
Next, a marker specific to the identified fraudulent website may be generated. This marker, according to one embodiment, may be generated to satisfy all of the constraints specified by the legitimate website being spoofed by the fraudulent website. The marker may be generated using different methods using, for example, random-number generators, dictionaries, cryptographic hashing algorithms and the like. However generated, the marker to be submitted to the fraudulent website pointed by the URL in the phishing message may be configured to satisfy the pre-existing constraints placed on legitimate login credentials on the legitimate website.
According to one embodiment, a scenario (including a pre-defined sequence of steps or actions) may be used to submit the marker elements to the fraudulent website. The scenario may be generic or specific to the brand, company or organization spoofed by the fraudulent website. The use of a generic scenario is possible in many cases, as a significant proportion of login scenarios are similar to one another. For example, it is very common that the login process may be carried out by filling login and password HTML input fields, and by submitting the resultant HTML form.
In contrast to a generic scenario, a scenario is brand-specific if the brand login page requires interactions that are specific to the brand/product/company. Such brand-specific interactions may include, for example:
For example,
Another example is Société Générale login page shown at 500 in
According to one embodiment, scenarios as disclosed herein may be executed using web driver technology. Web driver technology allows a web browser (Google Chrome, Mozilla Firefox, Safari) to be programmatically controlled. An example of such technology is Selenium WebDriver, available from www.seleniumhq.org, and that can be controlled by popular programming languages, including Java, Python, Ruby, Perl and C#.
According to one embodiment, a number of actions may be carried out on a fraudulent website using web driver technology. These actions include, for example, publishing markers submitted to fraudulent websites.
As noted above, marker elements submitted to fraudulent websites may be published or otherwise provided to free webmail providers. Brand-specific markers submitted to fraudulent websites will also be published to the concerned brand, company or organization. Along with the marker, the date on which the marker elements were injected into the fraudulent website may also be provided, along with the IP address of the fraudulent website.
An example of a data structure with which a marker may be published to Amazon.com is shown below:
Here, the publication of the marker provides the free webmail provider (and amazon.com and/or others) with the details of the marker elements submitted to the fraudulent websites. In this illustrative case, the marker submitted included elements corresponding to a login/password pair of (aaronsmith89@yahoo.com, hX#418+jKtr0984). The fraudulent websites to which this marker was injected are also detailed, by date, time, URL and IP address. In this case, the aaronsmith89@yahoo.com, hX#418+jKtr0984marker was submitted to three different fraudulent websites; namely amazon_phishing_url_1.com, amazon_phishing_url_2.com and amazon_phishing_url_3.com, at three different IP addresses.
Providing the details of the marker elements submitted to these websites, according to one embodiment, enables the free webmail providers to take appropriate action. Such appropriate action, in most cases, will include shutting down the webmail account of the phishing dropbox. The free webmail provider, using the published information, will then be able to detect phishing dropboxes. This may be carried out by, for example, filtering the inbound SMTP traffic and looking for the information in the published markers. In the previous example, the free webmail provider will identify phishing dropboxes by looking for inbound SMTP traffic that contains aaronsmith89@yahoo.com and hX#418+jKtr0984. Free webmail providers may also identify phishing dropboxes by inspecting inbound SMTP traffic coming from the IP addresses where the markers have been injected. In the previous example, SMTP traffic from 32.190.45.241, 119.93.230.12 and 230.38.137.145 would be considered to be highly suspect, since these websites have previously been identified as fraudulent websites that spoof established, well-known legitimate websites. Furthermore, according to one embodiment, the inspection may be refined using the time of injection, thanks to the date field in the published markers.
According to one embodiment, the computer-implemented method may further comprise retrieving, over the computer network, a list of the fraudulent websites from a database of known fraudulent websites and the Internet Protocol (IP) addresses therefor. In one embodiment, determining constraints may comprise comprises consulting a database that stores the constraints on the user credentials of the fraudulent websites. In this context, consulting the database may comprise downloading a list of the fraudulent databases and periodically checking the database for updates to this list of fraudulent databases. Randomly generating the marker elements may be carried out such that resultant fake credentials have high entropy (randomness). The fraudulent website may be configured to spoof a well-known website of an existing company (such as, for example, chase.com or amazon.com or paypal.com), product or brand. According to one embodiment, generating the fake user credentials and assembling the generating fake credentials into the marker may be carried out such that the assembled marker is specific to the existing company, product or brand. Programmatically inputting the generated fake user credentials may comprise executing a selected generic scenario or a brand, product or company-specific scenario. Whether generic or brand, product or company-specific, the scenarios may be configured to determine the manner in which the generated fake user credentials are inputted into the fraudulent website. According to one embodiment, programmatically inputting the generated fake user credentials into the input field(s) of the login page(s) of the fraudulent website is carried out only once per IP address.
Assembling the generated fake user credentials into a marker may further comprise adding, to the marker, the IP address of the fraudulent website, the date (and time) on which the generated fake user credentials were programmatically inputted into the input field(s) of the login page(s) of the fraudulent website. Other information may also be added to or in place of the previously-described information. According to one embodiment, publishing may comprise sending a copy of the assembled marker data structure to the provider of the email service of the received email (in many cases, a free webmail provider) and to a company or brand spoofed by the fraudulent website. These markers enable the recipient thereof to detect the phishing dropbox or dropboxes and to take curative action—which may include deleting the phishing dropbox and canceling the account of the owner of the dropbox.
As shown, the phishing detection engine may comprise a marker generation engine 810 and a marker injection engine 809. The marker generation engine 810 may be configured to generate the elements of the fake credentials (or select from pre-existing marker elements) that are to be input into the login fields of the identified fraudulent website and the marker injection engine 809 may be configured to programmatically input the generated fake credentials into their appropriate input fields of the fraudulent website. Web driver technology may be used for this purpose; that is, to remotely and programmatically control the fraudulent website to accept and submit the fake credentials of the generated marker. Programmatic detection of a phishing email, the identification of fraudulent websites, the generation of markers and the injection of such markers into the identified fraudulent websites may be readily scaled and automated to provide high-volume industrial-grade protection against phishing emails and the systematic eradication of phishing dropboxes as soon as they are detected.
Optionally, as shown at B9B6, the email address identified as the phishing dropbox may be canceled and/or other actions may be taken, with respect to the identified phishing dropbox.
Any reference to an engine in the present specification refers, generally, to a program (or group of programs) that perform a particular function or series of functions that may be related to functions executed by other programs (e.g., the engine may perform a particular function in response to another program or may cause another program to execute its own function). Engines may be implemented in software and/or hardware as in the context of an appropriate hardware device such as an algorithm embedded in a processor or application-specific integrated
Embodiments of the present invention are related to the use of computing devices to generate, inject and publish markers and to detect phishing dropboxes. According to one embodiment, the methods, devices and systems described herein may be provided by one or more computing devices in response to processor(s) 1002 executing sequences of instructions contained in memory 1004. Such instructions may be read into memory 1004 from another computer-readable medium, such as data storage device 1007. Execution of the sequences of instructions contained in memory 1004 causes processor(s) 1002 to perform the steps and have the functionality described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the described embodiments. Thus, embodiments are not limited to any specific combination of hardware circuitry and software. Indeed, it should be understood by those skilled in the art that any suitable computer system may implement the functionality described herein. The computing devices may include one or a plurality of microprocessors working to perform the desired functions. In one embodiment, the instructions executed by the microprocessor or microprocessors are operable to cause the microprocessor(s) to perform the steps described herein. The instructions may be stored in any computer-readable medium. In one embodiment, they may be stored on a non-volatile semiconductor memory external to the microprocessor, or integrated with the microprocessor. In another embodiment, the instructions may be stored on a disk and read into a volatile semiconductor memory before execution by the microprocessor.
While certain example embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the embodiments disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the embodiments disclosed herein.