The present application is related in subject matter to commonly-owned and co-pending U.S. application Ser. No. 14/542,939 filed on Nov. 17, 2014 entitled “Methods and Systems for Phishing Detection”, which is incorporated herein by reference in its entirety. The present application is also related in subject matter to commonly-owned and co-pending U.S. application Ser. No. 14/861,846 filed on Sep. 22, 2015 entitled “Detecting and Thwarting Spear Phishing Attacks in Electronic Messages”, which is also incorporated herein by reference in its entirety.
Spear phishing is an email that appears to be from an individual that you know. But it is not. The spear phisher knows your name, your email address, your job title, your professional network. He knows a lot about you thanks, at least in part, to all the information available publicly on the web.
Spear phishing is a growing threat. It is, however, a very different attack from a phishing attack. The differences include the following:
The first step of a spear phishing attack may come in the form of an electronic message (e.g., an email) received from what appears to be a well-known and trusted individual, such as a coworker, colleague or friend. In contrast, a (regular, non-spear) phishing email appears to be from a trusted company such as, for example, PayPal, Dropbox, Apple and the like. The second step of a spear phishing attack has a different modus operando: a malicious attachment or a malicious Universal Resource Locator (URL) that is intended to lead the victim to install malicious software (malware) that will perform malicious operations (data theft . . . ) or just a text in the body of the email that will lead the victim to perform the expected action (wire transfer, disclosure of sensitive information and the like). A regular, non-spear phishing attack relies only on a malicious URL.
To protect a user from spear phishing attacks, a protection layer, according to one embodiment, may be applied for each step of the spear phishing attack. Against the first step of the phishing attack, one embodiment detects an impersonation. Against the second step of the phishing attack, one embodiment may be configured to detect the malicious attachment, detect the malicious URL and/or detect suspect text in the body of the email or other form of electronic message.
According to one embodiment, an attempted spear phishing attack be thwarted or prevented through detection of the impersonation. To prevent such an impersonation, according to one embodiment, when a user receives an electronic message from an unknown or what may look like a known sender, it may be determined whether the sender email address or display name look like a known contact of the user. If this is indeed the case, the user may be warned that there may be an impersonation.
To fully appreciate the embodiments described, shown and claimed, herein, it is necessary to understand the difference between an electronic or email address and a display name. The display name is what is usually displayed in the email client software to identify the recipient. It is typically the first name and the last name of the recipient of the email or electronic message. Consider the following From header:
From: John Smith <john.smith@gmail.com>
In this case, the display name is “John Smith” and the email address is “john.smith@gmail.com”.
The protection layer, according to one embodiment, may comprise the following activities:
The following is a software implementation showing aspects of one embodiment, as applied to email addresses.
Several examples of email address impersonation or spoofing are shown in
Several examples of display name impersonation are shown in
Managing List of Known Contacts Email Addresses
According to one embodiment, a list may be managed, for the end user, of his or her known contacts email addresses called KNOWN_ADDRESSES. This list only contains known, trusted email addresses. In one implementation, all email addresses in this list are stored as lowercase.
The KNOWN_ADDRESSES list, according to one embodiment, may be initially fed by one or more of:
1. The email addresses stored in the address book of the end user. However, if the email address book exceeds ADDRESS_BOOK_MAX_SIZE (default value: 1,000 but may be higher or lower), the address book may not be used for performance and accuracy reasons. Address books of very large companies can become that large if, for example, they maintain a single address book for the contact information of all of their employees.
2. The email addresses stored in “From” header of emails or electronic messages received by the end user with the exception, according to one embodiment, of automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process.
3. The email addresses of people to whom the end user has sent an email.
The KNOWN_ADDRESSES list may be updated in one or more of the following cases:
1) When the address book is updated.
2) When the end user receives an email from a non-suspect new contact with the exception, according to one embodiment, of automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process.
3) When the end user sends an email to a new contact.
Managing List of Known Contacts Display Names
A list of the user's known contacts may be managed for the user. This list may be called KNOWN_DISPLAY_NAMES. According to one embodiment, this list may only contain normalized display names, which may be stored as lowercase strings. Normalization, in this context, refers to one or more predetermined transformations to which all display names are subjected to, to enable comparisons to be made.
The KNOWN_DISPLAY_NAMES, according to one embodiment, may be initially fed by one or more of:
1. The display names stored in the address book of the end user. However, if the email address book exceeds ADDRESS_BOOK_MAX_SIZE (default value: 1000 but may be higher or lower), the address book may not be used for performance and accuracy reasons.
2. The display names stored in “From” header of emails received by the end user with the exception of, according to one embodiment, automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process.
3. The display names of people to whom the end user has sent an email.
The KNOWN_DISPLAY_NAMES may then be updated, according to one embodiment, in one or more of the following cases:
Normalizing Display Names
The display name, according to one embodiment, may be normalized because:
There may be other reasons to normalize display names.
According to one embodiment, the normalization may be carried out as follows or according to aspects of the following:
As an example,
Managing List of Blacklisted Email Addresses
According to one embodiment, a list of blacklisted email addresses called BLACKLISTED_ADDRESSES may be managed for the user. This list of blackmailed email addresses will only contain email addresses that are always considered to be illegitimate and malicious. In one implementation, all email addresses in this blackmailed email address list will be stored as lowercase. If an email is sent by a sender whose email address belongs to BLACKLISTED_ADDRESSES, then the email will be dropped and will not be delivered to the end user, according to one embodiment. Other actions may be taken as well, or in place of dropping the email.
Detecting a Suspect Email Address
When the end user receives an electronic message such as an email, a determination is made whether the electronic address thereof is known, by consulting the KNOWN_ADDRESSES list. If the email address of the email's sender is present in the KNOWN_ADDRESSES list, the email address may be considered to be known. If, however, the email address of the sender is not present in the KNOWN_ADDRESSES list, the sender's email address is not considered to be known. In that case, according to one embodiment, a determination may be made to determine whether the email address resembles or “looks like” a known address.
An email address is made up of a local part, a @ symbol and a domain part. The local part is the left side of the email address, before the @ symbol. For example, “john.smith” is the local part of the email address john.smith@gmail.com. The domain is located at the right side of the email address, after the @ symbol. For example, “gmail.com” is the domain of the email address john.smith@gmail.com.
According to one embodiment, an email address may be considered to be suspect if the following conditions are met:
One embodiment utilizes the Levenshtein distance (a type of edit distance). The Levenshtein distance operates between two input strings, and returns a number equivalent to the number of substitutions and deletions needed in order to transform one input string (e.g., the local part of the received email address) into another (e.g., the local part of an email address in the KNOWN_ADDRESSES list). One embodiment, therefore, computes a string metric such as the Levenshtein distance to detect if there has been a likely spoofing of the local part of the received email address. The Levenshtein distance between two sequences of characters is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one sequence of characters into the other. Other string metrics that may be used in this context include, for example, the Damerau-Levenshtein distance. Others may be used to good benefit as well, such as the Jaccard distance or Jaro-Winkler distance, for example.
According to one embodiment, the local part of the email address may be considered suspect if the Levenshtein distance d (or some other metric d) between the local part of the email address and the local part of an email address of a record in the KNOWN_ADDRESSES list is such that:
This evaluation of the local part of a received email against the local part of a record in the KNOWN_ADDRESSES list may be carried as follows:
It should be noted that the parameters levenshtein_distance_threshold and localpart_min_length may be configured according to the operational conditions at hand and the security policy or policies implemented by the company or other deploying entity.
For example, if the levenshtein_distance_threshold is increased, then a greater number of spoofing attempts may be detected, albeit at the cost of raising a greater number of potentially non-relevant warning messages that are received by the user. The default values provided above should fit most operational conditions. As an alternative to Levenshtein distance, the Damerau-Levenshtein distance may also be used, as may other metrics and/or thresholds.
Detecting a Suspect Display Name
According to one embodiment, a string metric such as, for example, the Levenshtein distance may also be used to detect whether a display name has been spoofed or impersonated.
The detection of a suspect display name may be carried out, according to one embodiment, as follows:
It is to be understood that parameters such as levenshtein_distance_threshold and display_name_min_length may be configured according to the prevailing operational conditions and security policy or policies of the company or other deploying entity.
For example, if the levenshtein_distance_threshold or other metric or threshold is increased, a greater number of spoofing attempts may be detected, but at the possible cost of a greater number of non-relevant warnings that may negatively alter the user experience. The default values provided, however, should fit most operational conditions. As an alternative to Levenshtein distance [2], the Damerau-Levenshtein distance or other metrics or thresholds may be utilized to good effect.
Warning the End User
If it is determined that the received email impersonates a known email address or display name, a message may be generated to warn the end user, who must then make a decision:
At B74 in
Any reference to an engine in the present specification refers, generally, to a program (or group of programs) that perform a particular function or series of functions that may be related to functions executed by other programs (e.g., the engine may perform a particular function in response to another program or may cause another program to execute its own function). Engines may be implemented in software and/or hardware as in the context of an appropriate hardware device such as an algorithm embedded in a processor or application-specific integrated circuit.
Embodiments of the present invention are related to the use of computing device 812, 808, 810 to detect whether a received electronic message may be illegitimate as including a spear phishing attack. According to one embodiment, the methods and systems described herein may be provided by one or more computing devices 812,808, 810 in response to processor(s) 902 executing sequences of instructions contained in memory 904. Such instructions may be read into memory 904 from another computer-readable medium, such as data storage device 907. Execution of the sequences of instructions contained in memory 904 causes processor(s) 902 to perform the steps and have the functionality described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the described embodiments. Thus, embodiments are not limited to any specific combination of hardware circuitry and software. Indeed, it should be understood by those skilled in the art that any suitable computer system may implement the functionality described herein. The computing devices may include one or a plurality of microprocessors working to perform the desired functions. In one embodiment, the instructions executed by the microprocessor or microprocessors are operable to cause the microprocessor(s) to perform the steps described herein. The instructions may be stored in any computer-readable medium. In one embodiment, they may be stored on a non-volatile semiconductor memory external to the microprocessor, or integrated with the microprocessor. In another embodiment, the instructions may be stored on a disk and read into a volatile semiconductor memory before execution by the microprocessor.
While certain example embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the embodiments disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the embodiments disclosed herein.