A spear phishing email is an email that appears to be from a known person or entity. But it is not. The spear phisher often knows the recipient victim's name, address, job title and professional network. The spear phisher knows a lot about his intended victim, thanks to the quantity and rich variety of information available publicly through online sources, the media and social networks.
Spear phishing is a growing threat. Spear phishing is, however, very different from a phishing attack. The differences between a phishing attack and a spear phishing attack may include the following:
According to one embodiment, to protect a user from a spear phishing attack, a protection layer may be applied for each phase of the spear phishing attack. That is, during the first phase of the spear phishing attach, one embodiment detects whether an impersonation of a known sender is likely. During the second phase of the spear phishing attack, a detection procedure may be carried out, to determine whether the suspicious email may contain a malicious attachment, a malicious URL or contains suspect text in the body of the email.
According to one embodiment, to detect whether an email constitutes a potential spear phishing attack, the “From” email address (the sender's email address) may be scrutinized to detect whether the sender is a legitimate, known and trusted entity or is potentially an impersonation of the same. According to one embodiment, if a user receives an email from an unknown recipient, a check may be carried out to determine if the sender's email address is a known contact of the email recipient. If the sender's email address looks like but is in any way different from a known contact of the recipient, the email recipient may be warned (through the generation of a visual and/or audio cue, for example) that the email is at least potentially illegitimate, as impersonating a known contact—the essence of a spear phishing attack.
One embodiment is configured to protect the user (e.g., an email recipient) by carrying out activities including:
Managing List of Known Email Contacts
According to one embodiment, a list of his known email contacts called KNOWN CONTACTS may be created and maintained. All email addresses in this list may be stored in lowercase. According to one embodiment, the KNOWN_CONTACTS list may be initially seeded by the protected user's address book. According to one embodiment, the protected user's address book, for performance and accuracy reasons, may not be used if it exceeds a predetermined (say 1,000, for example) maximum number of entries. This predetermined maximum number of entries may be represented by an ADDRESS_BOOK_MAX_SIZE variable (whose default value may be set a 1,000). Very large address books may, for example, be associated with very large companies that share the whole company address book with all employees.
Another source of legitimate email address to populate the KNOWN_CONTACTS list are email addresses of emails received by the end user, with the exception of automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process. The email addresses of people to whom the end user has sent an email is also another source of legitimate email addresses. According to one embodiment, KNOWN_CONTACTS may be updated in one or more of the following cases:
Managing List of Blacklisted Contacts
According to one embodiment, a list of blacklisted email contacts called BLACKLIST may also be established and managed. All email addresses in this list are stored as lowercase. According to one embodiment, if an email is sent by a sender whose email address belongs to BLACKLIST, then that email will be dropped and will not be delivered to the protected user.
Detecting a Potentially Suspect or Illegitimate Email Address
When a protected user receives an email, a check may be carried out to determine whether the sender's email address is known. The KNOWN_CONTACTS list may be consulted for this purpose. If the email address is not known (e.g., is not present in the KNOWN_CONTACTS list), a determination may be carried out, according to one embodiment, to determine whether the email address looks like or is otherwise similar to a known address. An email address is made up of a local part, the @ symbol and a domain part:
According to one embodiment, an email may be considered to be suspect or potentially illegitimate if both of the following conditions are met:
According to one embodiment, a detection process may be carried out to determine whether the local part of the received email address has been spoofed, to appear to resemble the local part of an email address in the KNOWN_CONTACTS list. According to one embodiment, such a detection process may utilize a string metric to compare the local part of an email address in the KNOWN_CONTACTS with the local part of the received email address. A string metric (also known as a string similarity metric or string distance function) is a metric that measures distance (“inverse similarity”) between two text strings for approximate string matching or comparison and in fuzzy string searching. A string metric may provide a number that is an indication of the distance or similarity between two (e.g., alpha or alphanumeric) strings.
One embodiment utilizes the Levenshtein Distance (also known as Edit Distance). The Levenshtein Distance operates between two input strings, and returns a number equivalent to the number of substitutions and deletions needed in order to transform one input string (e.g., the local part of the received email address) into another (e.g., the local part of an email address in the KNOWN_CONTACTS list). One embodiment, therefore, computes a string metric such as the Levenshtein distance to detect if there has been a likely spoofing of the local part of the received email address. The Levenshtein distance between two sequences of characters is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one sequence of characters into the other. Other string metrics that may be used in this context include, for example, the Damerau-Levenshtein distance. Others may be used to good benefit as well.
According to one embodiment, an email address is considered as suspect if the string metric (the Levenshtein Distance in one implementation) d between the local part of the email address and the local part of an email address of KNOWN_CONTACTS is such that
d≦STRING_METRIC_DISTANCE_THRESHOLD
One implementation may include the following functionality:
Above, the minimum length for the local part of the email address has been set at 6 characters and the STRING_METRIC_DISTANCE_THRESHOLD has been set a 2. Of course, other values may be substituted for these values. Indeed, the parameters STRING_METRIC DISTANCE_THRESHOLD and localpart_min_length may be readily configured according to operational conditions and according to the security policies of the deploying organization.
For example, if the STRING_METRIC_DISTANCE_THRESHOLD is increased, a greater number of spoofing attempts may be detected, but a greater number of false positives (email addresses that are legitimate but are flagged as potentially illegitimate) may be generated. A greater number of false positives may erode the user experience and degrade the confidence of the protected user in the system and may lead the user to disregard flagged emails.
Flagging an Email as Potentially Illegitimate/Generating Warning Cue
If the email address is suspect, a visual (for example) cue (such as a message) may be generated to warn the protected user. According to one embodiment, the protected user may then be called upon to make a decision to:
One implementation may include the following functionality:
As shown at B38, if the purported known sender does not match one of the plurality of known senders in the database of known senders and the quantified degree of similarity of the purported known sender of the electronic message to one of the plurality of known senders of electronic messages is indeed greater than the threshold value, the received electronic message may be flagged as being suspect. Thereafter, a visual and/or other perceptible cue, warning message, dialog box and the like may be generated when the received electronic message has been flagged as being suspect, to alert the recipient thereof that the flagged electronic message is likely illegitimate.
According to one embodiment, the electronic message may be or may comprises an email. In Block B33, the quantifying may comprise calculating a string metric of the difference between the purported sender and one of the plurality of known senders in the database of known senders. In one embodiment, the string metric may comprise a Levenshtein distance between the purported sender and one of the plurality of known senders in the database of known senders.
After block B39, a prompt may be generated, to solicit a decision confirming the flagged electronic message as being suspect or a decision denying that the flagged electronic message is suspect. Thereafter, the electronic message flagged as suspect may be dropped when the prompted decision is to confirm that the flagged electronic message is suspect and the flagged electronic message may be delivered to its intended recipient when the prompted decision is to deny that the flagged electronic message is suspect.
Any reference to an engine in the present specification refers, generally, to a program (or group of programs) that perform a particular function or series of functions that may be related to functions executed by other programs (e.g., the engine may perform a particular function in response to another program or may cause another program to execute its own function). Engines may be implemented in software or hardware as in the context of an appropriate hardware device such as an algorithm embedded in a processor or application-specific integrated circuit.
Embodiments of the present invention are related to the use of computing device 412, 408, 410 to detect and compute a probability that received email may be or may include a spear phishing attack. According to one embodiment, the methods and systems described herein may be provided by one or more computing devices 412, 408, 410 in response to processor(s) 502 executing sequences of instructions contained in memory 504. Such instructions may be read into memory 504 from another computer-readable medium, such as data storage device 507. Execution of the sequences of instructions contained in memory 504 causes processor(s) 502 to perform the steps and have the functionality described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the described embodiments. Thus, embodiments are not limited to any specific combination of hardware circuitry and software. Indeed, it should be understood by those skilled in the art that any suitable computer system may implement the functionality described herein. The computing devices may include one or a plurality of microprocessors working to perform the desired functions. In one embodiment, the instructions executed by the microprocessor or microprocessors are operable to cause the microprocessor(s) to perform the steps described herein. The instructions may be stored in any computer-readable medium. In one embodiment, they may be stored on a non-volatile semiconductor memory external to the microprocessor, or integrated with the microprocessor. In another embodiment, the instructions may be stored on a disk and read into a volatile semiconductor memory before execution by the microprocessor.
While certain example embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the embodiments disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the embodiments disclosed herein.
The present application is related in subject matter to commonly-owned and co-pending U.S. patent application Ser. No. 14/542,939 filed on Nov. 17, 2014 entitled “Methods and Systems for Phishing Detection”, which is incorporated herein by reference in its entirety.