Phishing attack detection

Description

FIELD

Embodiments of the disclosure relate to the field of cybersecurity. More specifically, embodiments of the disclosure relate to a system and method for detecting phishing attacks.

GENERAL BACKGROUND

Phishing is a growing problem on the internet. Phishing is the attempt to obtain sensitive information from targets by disguising requests as legitimate. A phishing attack can entail the transmission of an electronic communication, such as an email, to one or more recipients that purports to be from a known institution, such as a bank or credit card company, and seems to have a legitimate intention; however, the email is actually intended to deceive the recipient into sharing its sensitive information. Often the email draws the recipient to a counterfeit version of the institution's webpage designed to elicit the sensitive information, such as the recipient's username, password, etc.

For example, a malware author may transmit an email to a recipient purporting to be from a financial institution and asserting that a password change is required to maintain access to the recipient's account. The email includes a Uniform Resource Locator (URL) that directs the recipient to a counterfeit version of the institution's website requesting the recipient to enter sensitive information in a displayed form in order to change the recipient's password. Neither the email nor the URL are associated with the actual financial institution or its genuine website, although the entail and the counterfeit website may have an official “look and feel” and imitate a genuine email and website of the institution. The phishing attack is completed when the recipient of the email enters and submits sensitive information to the website, which is then delivered to the malware author.

Current solutions for phishing detection include textual search and analysis of entails and a displayed webpage. However, such solutions have a plurality of drawbacks and too often fail to detect phishing attacks. As a first drawback, current textual search-based phishing detection systems may be unable to determine whether a website to which a URL resolves is a phishing website due to an insufficient amount of text displayed on the website. Specifically, when a website contains insufficient text, a textual search analysis may not have enough data to allow an accurate analysis. As a second drawback, current textual search-based current solutions may be unable to perform an analysis on the website to which the URL resolves due to the text of the website being contained within one or more images (e.g., bitmaps, jpegs, etc.), which cannot be processed using a textual search-based analysis. As yet another drawback, current textual search-based solutions may be unable to perform the necessary textual search and analysis in many languages due to an insufficient corpus of data; thus, providing a lackluster solution with respect to the global nature of attacks on businesses today (e.g., the large number of characters in Asian languages makes a textual search-based analysis difficult). Thus, a new phishing detection technique is needed to more efficiently, efficaciously, and reliably detect phishing cybersecurity attacks (“cyberattacks”) of this type.

DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is an exemplary block diagram of a logic flow during performance of a training process by a phishing detection and analysis system (PDAS) according to an embodiment of the invention;

FIG. 2 is an exemplary block diagram of a logic flow during performance of a detection process by a phishing detection and analysis system (PDAS) according to an embodiment of the invention;

FIGS. 3A-3B provide a flowchart illustrating an exemplary method for analyzing a URL by the PDAS of FIG. 4 to determine whether the URL is associated with a phishing attack; and

FIG. 4 is an exemplary embodiment of a logical representation of the phishing detection and analysis system of FIG. 1.

DETAILED DESCRIPTION

I. Overview Summary

Embodiments of systems and methods for detecting phishing attacks are described. The phishing, detection and analysis system (PDAS) is configured to detect a phishing attack through the use of computer vision techniques that leverage a graphic representation (i.e., the representation expressing the “look and feel”) of a webpage to determine whether the webpage is attempting to mimic a legitimate webpage. Some specific embodiments attempt to determine whether a webpage is attempting to mimic a webpage through which a user enters sensitive information, such as login credentials, for purposes of stealing such sensitive information.

As a general overview, the PDAS described herein includes (i) a training process and (ii) a detection process. The training process generates a machine learning model (“model”), the model including a set of correlation rules, that is used in the detection process. The detection process receives a URL and analyzes the URL based on the model to make a determination as to whether the URL is part of a phishing cyberattack.

The training process involves the generation of a model using machine learning techniques, the model representing a categorization of a training set of URLs into one or more webpage families, the training set of URLs known to be associated with genuine (non-phishing) websites (in some embodiments, known phishing URLs may be provided to improve the model). The training process includes retrieval of a screenshot associated with each URL of the training set of URLs, processing of each screenshot to (i) detect a set of keypoints and (ii) generate feature vectors corresponding to the detected set of keypoints. A feature may be interpreted as a keypoint and corresponding keypoint descriptors, e.g., parameters of the keypoint enabling identification of the keypoint and its location within a screenshot. After generation of the feature vectors, the feature vectors are labeled based on a known webpage family and the model is generated, the model being a digitized representation of the correlation or the feature vectors corresponding to the URLs within the training set of URLs. In some embodiments, the webpage families may represent a set of URL domains.

The model is generated for use in a detection process, discussed below, to identify the keypoints within a feature vector corresponding to at least one of the URLs within the training set to which a screenshot under analysis most closely correlates. A URL under analysis, a webpage under analysis or a screenshot under analysis may be referred to as a subject URL, a subject webpage and a subject screenshot, respectively. More specifically, the analysis of a subject screenshot using the model results in a set of confidences, each confidence corresponding to a feature vector corresponding to a URL within the training set. The highest confidence indicating the highest correlation between the subject screenshot and the feature vector corresponding to a URL within the training set. The correlation is based on the generated feature vectors of (i) the subject screenshot, and (ii) the screenshot(s) corresponding to the URLs within the training set.

A webpage family refers to a set of webpages associated with a particular company or other organization that shares a webpage design system reflecting the branding and design elements (including logos, layout and visual landscape (e.g., color, contrast, etc.)) of the organization to provide site visitors with a consistent and recognizable visual experience. The webpage family may include one or more webpage members, though generally it will include plural webpages for each family of most interest in the practice of the invention, i.e., those generally used in phishing attacks. The members of the webpage family may and generally will differ from one another, for example in message content (such as textual content, user-interactive elements, and pictorial elements) and even graphical elements; hence, they will generally exhibit variations called “variances” across the family. Often the webpage family may share a domain name and/or other URL components, but that is not the necessary and sufficient determinant of membership in the family since similar domain names and other URL components may mislead visitors as to the “owner” of the website. However, domain name sharing may be used as one aspect in determining family membership. Accordingly, the invention may use computer vision to determine family membership.

It should be noted that any variances will be such that webpages within a webpage family have a consistent layout and visual landscape. As discussed below, webpages belonging to a particular owner (e.g., a single company sharing a domain name) may differ in some aspects of the layout and visual landscape (e.g., differ in number of input types and/or input forms, for example, textboxes in a first webpage and radio dials in a second webpage). In such an embodiment, the webpages may be divided into two webpage families based on detected keypoints with both webpage families being linked to the single owner for the detection process.

In particular, during the training process, for each screenshot corresponding to a URL within the training set of URLs, a set of keypoints is detected. Each keypoint detected within the screenshot identifies a “point of interest.” Points of interest reflect regions of an image which are observable. A “keypoint” may be defined as an image region, e.g., a set of pixels, within the screenshot (e.g., any shaped-region such as a circular-shaped region). Known keypoint detection techniques such as rule sets that detect keypoints based on pixel density. Scale-Invariant Feature Transform (SIFT), Features from Accelerated Segment Test (FAST) and/or Binary Robust Invariant Scale Keypoints (BRISK) may be utilized to detect keypoints.

Subsequently, keypoint descriptors corresponding to the detected keypoints are determined. A keypoint descriptor may include a set of one or more parameters that describe the keypoint such as keypoint center coordinates x and y relative to the screenshot, a scale (e.g., being a radius of a circular image region, when applicable), and/or an orientation determined by the gradient of the pixel greyscale within the keypoint. The parameters enable the generated model to be invariant to orientation or scale differences in the subject screenshot. Each keypoint descriptor provides the ability to reliably identify a keypoint within a sereenshot. The keypoints and/or keypoint descriptors of the processed screenshot may be stored in a data store (e.g. a database).

More specifically, the training process begins upon receipt of a list of labeled URLs. The list of URLs resolves to webpages that are generally known to be typically targeted for use in phishing attacks such as login webpages of banks or other online accounts of well-known companies such as Apple iTunes®, Spotify®, Netflix®, etc. The list of URLs (wherein the set of URLs is referred to as the “training set”) may be obtained or updated periodically or aperiodically for training of the PDAS classifier logic so as to reflect commonly visited websites. The PDAS may obtain a plurality of screenshots corresponding to a webpage associated with a URL, each such screenshot corresponding to a browser/operating system combination. A screenshot of the webpage to which each URL resolves is obtained by the PDAS, which then utilizes computer vision techniques to detect keypoints, determine keypoint descriptors and generate a feature vector for each screenshot. A feature may be interpreted as a set of keypoints and their corresponding keypoint descriptors that indicate a point of interest within the screenshot (e.g., a logo or a portion thereof). A feature vector includes the plurality of features detected within a screenshot. In some embodiments, methods other than a vector may be used to store and organize the features, such as a matrix or other data structure within memory. As an example, the features may be distinctive aspects of a webpage that enable the PDAS, during the detection process, to determine whether a subject webpage is attempting to mimic a webpage included in the training set. The features of each screenshot are inserted into separate vectors and labeled according to the webpage family to which the URL corresponding to the feature vector belongs. The plurality of labeled feature vectors are then used by the PDAS to generate a model using machine learning. As mentioned above, the model is a digitized representation of the correlation of the feature vectors corresponding to the URLs within the training set of URLs. More specifically, the model may be a collection of keypoints corresponding to the training set described as a function implemented programmatically in code where the function is tuned using the training set and keypoints selected (digital sampling) by machine learning techniques. Once the function is generated and tuned, it can be applied to other (unknown) image keypoint sets during the detection process to analyze a subject screenshot. One example of the model may be a hyperplane.

During the machine learning in generating the model, the training typically involves a plurality of webpages from the same webpage family and the system is trained to recognize family membership through identifying keypoints shared (high correlation) across those “labeled” webpages. The detection of those keypoints, including their location within the corresponding webpage, is key to later classification of an “unlabeled” webpage as being a member of the family to which it purports to be a member (through visual similarity). After all, a webpage can have a large number of keypoints (e.g., hundreds or thousands for example), and the training, in some embodiments, may go to selection of the keypoints that together are unique to the corresponding screenshot and can be used for training analysis to accurately identify members of the labeled webpage families and later, after training, with respect to the unlabeled webpages. Moreover, the keypoints can be selected so as to capture the common branding, and design elements of a webpage family rather than variances across the members of the family so that membership in the family can be accurately determined with minimal or no false positives or false negatives.

In some embodiments, the generation of the model involves detecting keypoints within each screenshot corresponding to the URLs within the training set. The detected keypoints are then used to extract features within each screenshot and generate a feature vector for each screenshot. The detection of keypoints and generation of feature vectors are performed using computer vision techniques such as those mentioned above. Each feature vector within the set of feature vectors is labeled according to a webpage family to which it belongs and the set of feature vectors are then used to generate the model using machine learning techniques. The machine learning techniques may include, but are not limited or restricted to, the generation of (i) support vector machines (SVMs), (ii) distribution functions such as a naïve bayes classifier, (iii) a K-nearest neighbor pattern detection algorithm, and/or (iv) a decision tree.

Machine learning techniques rely on a machine learning model, which is executable by a processor to implement a computerized approach to learning from and making predictions based on data sets. These include stored known (labelled) data sets used to train (or tune) the machine learning model in reaching conclusions regarding the data sets, e.g., classify and verify the classification by comparison with the labels. The data sets also include one or more unknown (unlabeled) data sets, for which the machine learning model is to reach conclusions (classify) by applying its acquired “learning” with, if trained properly, a high degree of confidence.

In recent years, machine learning technology has seen development and application diverse fields (such as computer vision) of a great many widely-used, executable machine learning computer programs. These implement any of a variety of different machine learning techniques commonly referred to as machine learning functions or algorithms. For purposes of this invention, the details of the machine learning functions and their implementation in machine learning software programs need not be described here though those of skill in the art would be readily able to choose from many commercially or publicly available (“open source”) alternatives to implement this invention, as mentioned above.

Herein, machine learning is used to recognize membership and non-membership in a family of webpages as a strong indication of phishing attacks. More specifically the machine learning model represents the correlation of the feature vectors corresponding to the screenshots based on data sets associated with the screenshots. It can be understood that each data set is a collection of images information, which can be computationally processed pursuant to or by a machine learning function implemented programmatically. The machine learning function operates on keypoints expressed as keypoint descriptors and is generated or tuned using the training set to both select (digital sample) keypoints and use their descriptors, formed into feature vectors, in classifying the webpage images. The machine learning function itself is generated and tuned during the training phase to classify the data sets, and then is stored in association with the stored representation of the screenshot images in memory for later use. Once the function is generated and tuned, it can be applied to other (unknown) image data sets for their classification.

The detection process involves receipt of a URL for analysis to determine whether the URL is associated with a phishing cyberattack (“subject URL”). The detection process involves retrieval of a subject screenshot corresponding to the subject URL and detection of the keypoints of the subject screenshot. Keypoint descriptors are then generated that correspond to the detected keypoints. The detection process includes an analysis of the generated keypoint descriptors based on the model generated during the training process to determine a correlation between (i) the keypoints corresponding to the subject URL and (ii) keypoints corresponding to the URLs within the training set that have been categorized into webpage families. One or more screenshots corresponding to the webpage family being the most highly correlated to the keypoints of the subject screenshot is selected. In some embodiments there may be a plurality of webpage families closely correlated with the keypoints of the subject screenshot; the remainder of the system would be processed relative to that plurality. The keypoints of the subject screenshot are compared, via known image comparison techniques in the field of computer vision, to the keypoints of the selected screenshot. In some embodiments, a plurality of screenshots may be associated with a single webpage family as mentioned above, in which case an image comparison would be performed between the subject screenshot and each screenshot of the plurality of screenshots corresponding to the most highly correlated webpage family. If this image comparison exceeds a threshold, the subject URL is determined to be associated with a phishing cyberattack. Upon determination of the subject URL being associated with a phishing cvberattack, an alert and/or a report is issued to an administrator or a cybersecurity analyst.

More particularly, the detection process includes the (i) generation of a subject screenshot of a webpage retrieved from a subject URL, (ii) processing the subject screenshot to identify a set of keypoints, (iii) correlating the set of keypoints to a set of known benign or known phishing pages using the model, and (iv) if the correlation exceeds a threshold, classifying the subject URL as part of a phishing cyberattack. In some embodiments, the retrieval of the subject screenshot, or content associated therewith, may be via a centrally located system using an internet browser as discussed below or via accessing a data caching system that has stored therein previously captured screenshots. In some embodiments, the PDAS performs a pre-filtering process, which may include static scanning of the subject URL (e.g., blacklist or white list analysis, namely heuristics, exploit signature checks and/or vulnerability signature checks for example). If the subject URL is not determined to be either malicious (i.e. related to a phishing attack) or benign based on the static scanning, a subject screenshot of the subject webpage to which the subject URL resolves is obtained by the PDAS.

The screenshot may result from the processing of the webpage (the webpage associated with the URL) based on the characteristics (e.g. selected internet browser applications, operating systems, etc.). A logic module of the PDAS utilizes computer vision techniques to detect keypoints within the subject screenshot and generates a feature vector based on the detected keypoints in the same manner as discussed above with respect to the training process. The feature vector of the subject screenshot is analyzed using the model to determine a set of confidences, with each confidence corresponding to a separate labeled feature vector corresponding to the training set; thus, providing an understanding of the webpage family having the highest confidence (e.g., which webpage family, and specifically, which webpage, is most likely being mimicked by the subject screenshot). For example, a first confidence corresponds to the likelihood a screenshot within a first webpage family is being mimicked and a n^thconfidence corresponds to the likelihood a screenshot within a n^thwebpage family is being mimicked. A screenshot of at least a first webpage of the webpage family having the highest confidence is then used in an image comparison operation with the subject screenshot. The image comparison may include a comparison of detected keypoints of the subject screenshot and the webpage(s) of the webpage family having the highest confidence. When the image comparison results in a match above a predefined threshold, the PDAS determines that the subject webpage and the subject URL are part of a phishing attack.

Specifically, in contrast to alternative phishing detection systems that may merely perform image comparisons—e.g., comparisons of detected keypoints in a brute force manner between a webpage under analysis and hundreds or thousands of webpage screenshots—the disclosure provides novel systems and methods that enable a detection process involving computer vision techniques to avoid performing image comparisons between a subject webpage and hundreds or thousands or webpage screenshots while providing a determination that limits false positives and false negatives through the use of a model trained using the detected keypoints of hundreds or thousands of webpage screenshots prior to the detection process. Specifically, a brute force image comparison of hundreds or thousands of webpage screenshots within a training set to the subject webpage is avoided by generating a model that represents the detected keypoints of each of the webpages within the training set and utilizing the model to obtain a set of confidences, each confidence indicating the likelihood a webpage of a webpage family is being mimicked by the subject webpage.

To achieve higher efficiencies during analysis compared to alternative systems, the systems and methods described below only conduct an image comparison with respect to the webpage(s) corresponding to the webpage family having the highest confidence of visual similarity to the subject screenshot. Thus, an image comparison of (i) the subject screenshot and (ii) the webpage(s) corresponding to the webpage family having the highest confidence is more efficient with time and resources than a brute force method of performing image comparisons between the subject screenshot and hundreds or thousands of screenshots. Thus, with respect to the detection process, by performing the feature generation and classification processes discussed in detail below prior to the image comparison, the disclosure provides systems and methods for detecting phishing URLs and webpages that efficiently use resources and save processing time previously needed to perform such a determination.

II. Terminology

In the following description, certain terminology is used to describe various features of the invention. For example, each of the tents “logic” and “component” may be representative of hardware, firmware or software that is configured to perform one or more functions. As hardware, the term logic (or component) may include circuitry having data processing and/or storage functionality. Examples of such circuitry may include, but are not limited or restricted to a hardware processor (e.g., microprocessor, one or more processor cores, a digital signal processor, a programmable gate array, a microcontroller, an application specific integrated circuit “ASIC”, etc.), a semiconductor memory, or combinatorial elements.

Additionally, or in the alternative, the logic (or component) may include software such as one or more processes, one or more instances, Application Programming Interface(s) (API), subroutine(s), function(s), applet(s), servlet(s), routine(s), source code, object code, shared library/dynamic link library (dll), or even one or more instructions. This software may be stored in any type of a suitable non-transitory storage medium, or transitory storage medium (e.g., electrical, optical, acoustical or other form of propagated signals such as carrier waves, infrared signals, or digital signals). Examples of a non-transitory storage medium may include, but are not limited or restricted to a programmable circuit; non-persistent storage such as volatile memory (e.g., any type of random access memory “RAM”); or persistent storage such as non-volatile memory (e.g., read-only memory “ROM”, power-backed RAM, flash memory, phase-change memory, etc.), a solid-state drive, hard disk drive, an optical disc drive, or a portable memory device. As firmware, the logic (or component) may be stored in persistent storage.

Herein, a “communication” generally refers to related data that is received, transmitted, or exchanged within a communication session. The data may include a plurality of packets, where a “packet” broadly refers to a series of bits or bytes having a prescribed format. Alternatively, the data may include a collection of data that may take the form of an individual or a number of packets carrying related payloads, e.g., a single webpage received over a network.

The term “computerized” generally represents that any corresponding operations are conducted by hardware in combination with software and/or firmware.

According to one embodiment of the disclosure, the term “malware” may be broadly construed as any code, communication or activity that initiates or furthers a cyberattack. Malware may prompt or cause unauthorized, anomalous, unintended and/or unwanted behaviors or operations constituting a security compromise of information infrastructure. For instance, malware may correspond to a type of malicious computer code that, as an illustrative example, executes an exploit to take advantage of a vulnerability in a network, network device or software, for example, to gain unauthorized access, harm or co-opt operation of a network device or misappropriate, modify or delete data. Alternatively, as another illustrative example, malware may correspond to information (e.g., executable code, script(s), data, command(s), etc.) that is designed to cause a network device to experience anomalous (unexpected or undesirable) behaviors. The anomalous behaviors may include a communication-based anomaly or an execution-based anomaly, which, for example, could (1) alter the functionality of a network device executing application software in an atypical manner; (2) alter the functionality of the network device executing that application software without any malicious intent; and/or (3) provide unwanted functionality which may be generally acceptable in another context.

A “characteristic” includes data associated with an object under analysis that may be collected without execution of the object such as metadata associated with the object (e.g., size, name, path, grey scale, etc.) or content of the object (e.g., portions of code) without execution of the selected object.

The term “object” generally relates to content (or a reference to access such content) having a logical structure or organization that enables it to be classified for purposes of analysis for malware. The content may include an executable (e.g., an application, program, code segment, a script, dynamic link library “dll” or any file in a format that can be directly executed by a computer such as a file with an “.exe” extension, etc.), a non-executable (e.g., a storage file; any document such as a Portable Document Format “PDF” document; a word processing document such as Word® document; an electronic mail “email” message, web page, etc.), or simply a collection of related data. In one embodiment, an object may be a URL or list of URLs. The object may be retrieved from information in transit (e.g., one or more packets, one or more flows each being a plurality of related packets, etc.) or information at rest (e.g., data bytes from a storage medium).

The term “network device” may be construed as any electronic computing system with the capability of processing data and connecting to a network. Such a network may be a public network such as the Internet or a private network such as a wireless data telecommunication network, wide area network, a type of local area network (LAN), or a combination of networks. Examples of a network device may include, but are not limited or restricted to, an endpoint device (e.g., a laptop, a mobile phone, a tablet, a computer, etc.), a standalone appliance, a server, a router or other intermediary communication device, a firewall, etc.

The term “transmission medium” may be construed as a physical or logical communication path between two or more network devices or, between components within a network device. For instance, as a physical communication path, wired and/or wireless interconnects in the form of electrical wiring, optical fiber, cable, bus trace, or a wireless channel using radio frequency (RF) or infrared (IR), may be used. A logical communication path may simply represent a communication path between two or more network devices or between components within a network device.

Finally, the terms “or” and “and/or” as used herein are to be interpreted as inclusive or meaning any one or any combination. Therefore, “A, B or C” or “A, B and/or C” mean “any of the following: A; B; C; A and B: A and C; B and C; B and C.” An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.

As this invention is susceptible to embodiments of many different forms, it is intended that the present disclosure is to be considered as an example of the principles of the invention and not intended to limit the invention to the specific embodiments shown and described.

III. General Architecture

Referring to FIG. 1, an exemplary block diagram of a logic flow during performance of a training process by a phishing detection and analysis system (PDAS) according to an embodiment of the invention is shown. The logic flow 100 of the training process illustrates the flow of data among logic modules of the PDAS 400, as seen in FIG. 4, in order to train a classifier 112 (e.g., CV classifier) for use in detecting URLs that resolve to phishing websites.

As an overview the training process involves receipt of a list of URLs for use the detection of phishing websites. The list of URLs may be based on internal analytics, a third-party source, or the like. The URLs included in the list of URLs may be either known, benign websites (e.g., those that are often used in carrying out phishing attacks) and/or known phishing websites. The screenshot of the website of each URL of the list of URLs is obtained by the content fetcher 104, and the feature generation logic 106 utilizes computer vision techniques to generate keypoint descriptors, also referred to as “features” as mentioned above, based on each screenshot, as discussed below. The features of each screenshot are inserted into separate vectors (“feature vectors”). The feature domain mapper 108 receives the feature vectors and labels each vector according to the website family of the screenshot to which the feature vector corresponds. The plurality of labeled feature vectors are then provided to the training module 110, which uses the plurality of feature vectors to generate a model that categorizes the plurality of labeled feature vectors. In one embodiment, the model represents a plurality of hyperplanes onto which the features of each vector may be categorized. The model is then provided to the classifier 112 for use in the detection process, as discussed with respect to at least FIG. 2.

More specifically, the training process of FIG. 1 begins with the content fetcher 104 receiving a list of URLs from a source. The source may include, but is not limited or restricted to a third-party website, an administrator and/or a cybersecurity analyst (hereinafter referred to as “an analyst”). Specifically, the list of URLs provided to the content fetcher 104 may be used to fetch data used in training the classifier 112. As one non-limiting example, a detection process may be focused on detecting phishing websites attempting to mimic banking websites, which are often used in phishing, attacks. However, the disclosure should not be limited to banking websites, instead, any website may be used by the PDAS 400. For purposes of clarity the examples discussed herein will involve banking websites.

Upon receiving the list of URLs, the content fetcher 104 obtains a screenshot of the website to which each URL provided resolves. The content fetcher 104 obtains a screenshot by utilization of an internet browser to access a URL to render the webpage to which the URL resolves and, after waiting a specified timeout period during which the webpage rendering is completed, a screen shot is captured and saved as an object (e.g., an image file such as a JPEG). Alternatively, as discussed above, the content fetcher 104 may obtain the screenshot via data caching system in a situation when the screenshot has been previously obtained and stored therein.

Upon obtaining a screenshot of the website to which each URL resolves, one or more screenshots may be provided to the feature generation logic 106. It should be noted that the content fetcher 104 may provide the one or more screenshots (or identifiers thereof, such as file names) to the feature generation logic 106 as other screenshots from the list of URLs are being collected, as opposed to obtaining the screenshots prior to passing the screenshots along to the feature generation logic 106. For each screenshot, the feature generation logic 106 is responsible for: (1) detecting keypoints within the screenshot, (2) generating keypoint descriptors based on the detected keypoints, and (3) generating a feature vector that includes the generated keypoint descriptors. The feature generation logic 106 uses computer vision techniques to detect the keypoints. According to one embodiment of the disclosure, the computer vision techniques may include detection of groupings of pixels wherein a grouping of pixels includes predetermined characteristics such as specified changes in grey scale. The feature generation logic 106 may utilize the computer vision techniques to detect edges and corners in images in the screenshot or more generally to perform density location operations, which detect groupings of pixels within the screenshot that include a high density of pixels (e.g., non-white space). The feature generation logic 106 may detect keypoints of the screenshots, the keypoints related to one another based on geometric measurements (e.g. distance between sets of keypoints, angle of intersection between sets of keypoints, etc.). Specific examples of the keypoint detection procedure will be well known to those skilled in the art. Additionally, various computer vision techniques may be utilized to detect keypoints. One example of a computer vision technique that may be utilized includes blob detection based on one or more matrices, e.g., the Hessian matrix, to determine change in the greyscale of pixels and generation of keypoints that exceed a predefined threshold of change according to a calculated determinant of one or more of the matrices. The term “blob” may refer to a region of pixels. Further, in one embodiment, a computer vision technique may be used to detect keypoints, which may then be generated and placed in a feature vector as discussed herein, such that the edge, corner and/or blob detection is dependent on detection of properties of a screenshot such as brightness, color, or greyscale compared to surrounding regions.

Upon detecting a plurality of keypoints within a sereenshot, the feature generation logic 106 determines a keypoint descriptor for each keypoint. A keypoint descriptor may be generated by extracting a predefined-sized block of pixels including the keypoint, dividing the block of pixels into sub-blocks, and taking into account a plurality of possible orientations of the pixels and storing such information (a keypoint descriptor may be referred to herein as a feature). In one embodiment, a vector (“a feature vector”) may then be created for each screenshot, the feature vector storing the plurality of keypoint descriptors for a particular screenshot.

The set of feature vectors for the screenshots are then provided to the feature domain mapper 108 by the feature generation logic 106. With respect to a first feature vector, the feature domain mapper 108 labels the first feature vector according to the webpage family of the webpage to which the URL resolves to which the generated features correspond. As a non-limiting example, a feature vector is generated for a URL that resolves to a Bank of America webpage (e.g., a log-in webpage). The feature vector (containing the generated features of the Bank of America webpage) is then labeled as “Bank of America.” Herein, labeling may correspond to appending to or otherwise associating an identifier with a feature vector. The feature domain mapper 108 performs the labeling process for each feature vector, wherein, in one embodiment, the webpage family may be provided along with the screenshot from the content fetcher 104 and further passed along with the feature vector from the feature generation logic 106.

The plurality of labeled feature vectors are provided to the training module 110, which generates a model, based on the plurality of feature vectors, to associate feature vectors based on labeling. The association of feature vectors may be based on a correlation of the plurality of feature vectors above a predefined (or variable) threshold. As mentioned above, in one embodiment, the model may represent modeling of a plurality of hyperplanes into which the features of each vector may be categorized. In such an embodiment, each URL may be representative of a webpage family with each webpage family having its own hyperplane. Each of the plurality of hyperplanes may be generated by the training module 110, based on the key point descriptors discussed above, as well as the keypoints themselves (i.e., keypoint center coordinates randy, a scale of the keypoint and an orientation of the keypoint). Additionally, in such an embodiment, the training module 110 may then generate a model that represents the plurality of hyperplanes. The model is then provided to the classifier 112 for use in the detection process, as discussed with respect to at least FIG. 2. In alternative embodiments, the model may be generated to represent the categorization of the plurality of feature vectors in other forms, such as in a model representing a histogram, wherein each bin of the histogram includes the feature vector corresponding to a webpage within the training set.

Referring now to FIG. 2, an exemplary block diagram of a logic flow during performance of a detection process by a PDAS according to an embodiment of the invention is shown. The logic flow 200 of the detection process illustrates the flow of data among logic modules of the PDAS 400, as seen in FIG. 4, in detecting URLs that resolve to phishing websites.

As a general overview, the detection process begins when the PDAS 400 receives a subject URL. In one embodiment, the pre-filter 116 is provided the subject URL and performs a pre-filtering step, discussed below. However, according to another embodiment, the PDAS 400 may receive an object and, in such an embodiment, an optional URL extractor 114 may first extract the subject URL (e.g., from an email or other object) and provide the extracted subject URL to the pre-filter 116 for pre-filtering. The pre-filter 116 performs a pre-filtering process, such as one or more static scans, on the URL, which may include performing whitelist/blacklist comparisons. When the subject URL is not found to be either malicious or benign, the subject URL is provided to the content fetcher 104, which obtains, in some embodiments, generates, a screenshot of the webpage to which the URL resolves, as discussed above.

The content fetcher 104 retrieves from the URL then provides the subject screenshot of the contents of the subject webpage (e.g., an image file, or an identifier, enabling retrieval of the image file), rendered by an interne browser, to the feature generation logic 106. As discussed above, the feature generation logic 106 detects keypoints within the subject screenshot and generates a feature vector based on the detected keypoints. The feature vector corresponding to the subject screenshot is provided to the classifier 112 for webpage family classification based on the model generated by the training module. As discussed above with respect to FIG. 1, each webpage family may correspond to a URL domain (e.g., each webpage family may correspond to a domain of a bank website such that the webpage families may be, for example, Bank of America, Wells Fargo, First Republic, etc.). A confidence may be determined for webpage family based on an analysis of the detected keypoint descriptors of the subject screenshot in accordance with the model. The confidence determined for a webpage family indicates the likelihood that the subject screenshot is attempting to mimic the webpage corresponding to the webpage family.

The webpage family having the highest confidence may be passed to an image comparator 120 (e.g., CV image comparator) which performs an image comparison between the subject screenshot and the webpage corresponding to the webpage family with the highest confidence. In some embodiments, one or more webpage families having the highest confidences are passed to the image comparator 120, which performs the image comparison for screenshots of webpages corresponding to the one or more webpage families. During training, a feature vector is determined for each webpage family and each feature vector is utilized in generating the model. The model provides confidences for each feature vector corresponding to URLs within the training; thus, the set of confidences provide an indication as to both the webpage family corresponding to the highest confidence and the feature vector, corresponding to a particular screenshot, having highest confidence. In a second embodiment, a webpage within a webpage family may be predefined as the webpage within a webpage family to be used in an image comparison when the webpage family is determined to have the highest confidence. In another embodiment, two or more (or all) webpages within a webpage family may be indicated as having the highest correlation and/or two or more (or all) webpages within the webpage family may be predefined as those to be used in an image comparison.

When the image comparison results in a match (e.g., correlation value) above a predefined threshold, the PDAS 400 determines that the subject webpage and the subject URL itself are part of a phishing attack. Where two or more webpages within a webpage family are used, the comparison with the subject webpage may be made separately for each of the webpages and, in alternatively embodiments (i) if any or a prescribed number of the resulting correlation values exceed the threshold, the URL is declared part of a phishing attack, or (ii) if the correlation value determined by statistically combining the separate correlation values (e.g., as by determining the mean, median, or mode of the separate correlation values) exceeds the threshold, the URL is declared part of a phishing attack.

In contrast to performing image comparisons on a large body of screenshots, the above-described detection process involving computer vision techniques analyzes only a relevant (based on a level of confidence) subset of screenshots associated with the subject screenshot.

More specifically, the detection process of FIG. 2 begins when the PDAS 400 receives a URL. As discussed above, the PDAS 400 may receive an object and, in such an embodiment, an optional URL extractor 114 may first extract the URL from the object and provide the extracted URL to the pre-filter 116 for pre-filtering. In another embodiment, the PDAS 400 may be provided with a URL, which may be passed directly to the pre-filter 116.

The pre-filter 116 performs a pre-filtering process on the URL, which may include one or more static scans such as whitelist/blacklist comparisons. In particular, the whitelist/blacklist database 118 stores data corresponding to whitelisted URLs (indicators determined to be benign) as well as blacklisted URLs (indicators determined to be associated with cyberattacks, e.g., phishing attacks). Comparisons performed by the pre-filter 116 between the whitelisted and blacklisted URLs stored in the whitelist/blacklist database 118 seek to remove any URLs known to be either benign or malicious. As a result of removing known benign or malicious URLs from the analysis, URLs passed on by the pre-filter 116 as not being knowingly benign or malicious and that resolve to webpages that very closely resemble known benign webpages (e.g., those of Bank of America, Wells Fargo, etc.) or malicious (e.g., known phishing webpages) are determined to be phishing webpages. Specifically, known benign URLs may be removed from the detection analysis by the pre-filter 116 (e.g., legitimate URLs of Bank of America, Wells Fargo, etc.) thus, URLs that are not removed by the pre-filter 116 and resolve to a webpage that very closely resembles the “look and feel” (graphic representation) of a benign webpage may be determined to be a phishing URL.

When the URL is not found to be either malicious or benign URL is not present in the blacklist or whitelist), the URL is provided to the content fetcher 104, which obtains a screenshot of the webpage to which the URL resolves, as discussed above with respect to the training process in accordance with FIG. 1. The content fetcher 104 then provides the screenshot of the webpage (e.g., an image file, or an identifier enabling, retrieval of the image file) to the feature generation logic 106. The feature generation logic 106 uses computer vision techniques to detect keypoints within the screenshot. The feature generation logic 106 extracts blocks of pixels from the screenshot having a predetermined size, e.g., a 16×16 block, that includes the keypoint. Each block of pixels is then used to generate a keypoint descriptor for the keypoint included within the block of pixels as discussed above. The plurality of keypoint descriptors describing the keypoints detected within a screenshot is stored in a vector, referred to herein as a “feature vector.” Specifically, the feature vector represents a description of the keypoints of the subject screenshot. The feature vector is then provided to the classifier 112.

The classifier 112 uses the feature vector of the subject screenshot as an input to the model generated during training. Analyzing the feature vector of the subject screenshot using the model results in a plurality of confidences. Each confidence of the plurality of confidences corresponds to a separate webpage family of the URLs provided to the PDAS 400 during training (“the training set”). As an illustrative example, when the training set includes URLs for Bank of America, Wells Fargo, First Republic, and other known banking webpages for a total of twenty (20) banking webpages in the training set, the analysis of the feature vector of the subject screenshot during the detection process may result in 20 confidences. Specifically, a first confidence may correspond to the Bank of America webpage, a second confidence may correspond to the Wells Fargo webpage, etc., with each confidence indicating the likelihood that the subject webpage is attempting to mimic the webpage corresponding to the webpage family. Continuing the example, the first confidence indicates the likelihood that the subject webpage is attempting to mimic the Bank of America webpage based on how closely the subject webpage resembles the “look and feel” of the Bank of America webpage.

The webpage family having the highest confidence may be passed to the image comparator 120, which performs an image comparison between the subject screenshot and the webpage corresponding to the webpage family with the highest confidence. The image comparison may perform an in-depth comparison of keypoints according to the keypoint descriptors within the feature vector of the subject screenshot with the keypoints of the webpage corresponding to the webpage family having the highest confidence to determine how closely the subject screenshot matches the webpage corresponding to the webpage family having the highest confidence. When the image comparison results in a match above a predefined threshold, the PDAS 400 determines that the subject webpage and the subject URL itself are part of a phishing attack.

When the subject URL and the subject webpage are determined to be part of a phishing attack, the reporting engine 122 generates an alert to a cybersecurity analyst, an administrator, and/or users of one or more endpoints indicating that the subject URL and subject webpage are part of a phishing attack.

In additional embodiments, a webpage family may include a plurality of webpages (e.g., Bank of America login webpages) that vary slightly. In such an embodiment, during the training process, the feature domain mapper 108 may label the feature vectors of the two or more webpages with the same webpage family and the feature vectors may be mapped to the same hyperplane during the generation of the model by the training module 110.

In some embodiments, two webpage families may correspond to the same overall webpage “owner.” For example, as Bank of America may have multiple login webpages for which the “look and feel” differs, a first Bank of America login webpage may include two text boxes corresponding to an entry of a customer's username and password, while a second Bank of America login webpage may include three text boxes corresponding to an entry of a customer's email address, social security number and birthday. Thus, for purposes of the training and detection processes, the first and second Bank of America differ in terms of their “look and feel” and may be afforded separate webpage families. However, both webpage families may be linked to Bank of America for the detection process.

Referring to FIGS. 3A-3B, a flowchart illustrating an exemplary method for analyzing a URL by the PDAS of FIG. 4 to determine whether the URL is associated with a phishing attack is shown. Each block illustrated in FIGS. 3A-3B represents an operation performed in the method 300 of detecting whether a URL is associated with a phishing attack by the phishing detection and analysis system (PDAS). Herein, the method 300 starts when the PDAS receives an object for phishing analysis (block 302). In one embodiment, the object may be a URL for analysis. However, in an alternative embodiment, the object may be, for example, an email message (email) wherein the content of the email includes a URL. In such an embodiment in which the object is an email, the method 300 may include an optional step of extracting the URL from the email (optional block 304).

Subsequently, the method 300 includes an operation of performing a pre-filter check on the URL (block 306). In one embodiment, the pre-filter check includes a static analysis of the URL, which may include, but is not limited to, a comparison with one or more entries of a whitelist and/or a comparison with one or more entries of a blacklist. In some embodiments, when the object is deemed suspicious and/or cannot be determined to be either benign or phishing, the method 300 continues analysis of the object by obtaining a screenshot of the webpage to which the URL resolves (“URL screenshot” as mentioned above) (block 308).

Upon obtaining the URL screenshot, the method 300 detects keypoints within the URL based on computer vision techniques and determines keypoint descriptors. Based on the keypoints and the determined keypoint descriptors, a feature vector is generated that includes the keypoints and their keypoint descriptors (block 310). The keypoints may include, inter alia, regions on the URL screenshot that are high-contrast regions of the URL screenshot. In one embodiment, a high-contrast region may refer to a set of two or more pixels such that a change in greyscale value is greater than or equal to a predefined threshold of two neighboring pixels. Additionally, in some embodiments, the URL screenshot may be in color and in such embodiments, the detection may include detection of a variance in color hues above a predefined threshold (e.g., a change in red, green, yellow, and blue values defining the pixel).

The feature vector is provided to a classifier, e.g., the classifier 112 as seen in FIGS. 1-2, and analyzed according to the model generated during the training process as discussed above. The analysis of the feature vector with the model results in a determination of a confidence for each feature vector included in the training set of URLs (block 312).

The webpage having the highest confidence based OD the analysis using the model is provided to, e.g., the CV image comparator 120 as seen in FIG. 2, which performs an image comparison between the URL screenshot and a screenshot of the webpage corresponding to the feature vector having the highest confidence (block 314). When the result of the image comparison is less than a predefined threshold, e.g., indicating a match of the two screenshots does not meet the predefined threshold (no at block 316), the method 300 determines the subject URL is not a phishing URL (block 318).

However, when the result of the image comparison is greater than or equal to the predefined threshold e.g., indicating a match of the two screenshots meets or exceeds the predefined threshold (yes at block 316), the method 300 determines the subject URL is a phishing URL (block 320) and subsequently generates and issues an alert (block 322). The alert may be issued to, for example, a user attempting to access the URL using an endpoint device, a network administer and/or a cybersecurity analyst.

FIG. 4 is an exemplary embodiment of a logical representation of the phishing detection and analysis system of FIG. 1. The phishing detection and analysis system (PDAS) 400, in an embodiment, may be stored on a non-transitory computer-readable storage medium of an endpoint device that includes a housing, which may be made entirely or partially of a hardened material (e.g., hardened plastic, metal, glass, composite or any combination thereof) that protects the circuitry within the housing, namely one or more processors 402 that are coupled to a communication interface 404 via a first transmission medium 406. The communication interface 404, in combination with a communication logic 412, enables communications with external network devices and/or other network appliances to receive updates for the PDAS 400. According to one embodiment of the disclosure, the communication interface 404 may be implemented as a physical interface including one or more ports for wired connectors. Additionally, or in the alternative, the communication interface 404 may be implemented with one or more radio units for supporting wireless communications with other electronic devices. The communication interface logic 412 may include logic for performing operations of receiving and transmitting one or more objects via the communication interface 404 to enable communication between the PDAS 400 and network devices via a network (e.g., the internet) and/or cloud computing services, not shown.

The processor(s) 402 is further coupled to a persistent storage 410 via a second transmission medium 408. According to one embodiment of the disclosure, the persistent storage 410 may include the following logic as software modules: the pre-filter 116, the URL extractor 114, the content fetcher 104, the feature generation logic 106, the feature domain mapper 108, the training module 110, the classifier 112, the image comparator 120, the reporting engine 122, and the communication interface logic 412. The operations of these software modules, upon execution by the processor(s) 402, are described above. The whitelist/blacklist database 118 is stored data for access by the pre-filter 116. Of course, it is contemplated that some or all of this logic may be implemented as hardware, and if so, such logic could be implemented separately from each other.

In the foregoing description, the invention is described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims.

Claims

1. A computerized method for analyzing a subject Uniform Resource Locator (URL) by a phishing detection and analysis system (PDAS) to determine whether the subject URL is associated with a phishing attack, the PDAS including one or more processors and a storage medium, the method comprising: performing, by the PDAS, a first set of operations including: detecting a plurality of keypoints within a subject screenshot of a subject webpage corresponding to the subject URL;providing the plurality of keypoints to a machine learning model, wherein the machine learning model is a representation of relationships between a set of training feature vectors representing a training set of URLs, each of the training feature vectors corresponds to a separate webpage family within a set of webpage families;executing the machine learning model using the plurality of keypoints as input to the machine learning model to determine a set of confidences, wherein each confidence within the set of confidences reflects a potential relationship between the subject screenshot and a webpage family within the set of webpage families, and wherein a first webpage family within the set of webpage families is associated with a highest confidence of the set of confidences;performing, by the PDAS, a second set of operations including: performing an image comparison between the subject screenshot and one or more screenshots corresponding to webpages within the first webpage family;determining whether a result of the image comparison exceeds a predefined threshold;responsive to the determining that the result of the image comparison exceeds the predefined threshold, generating an alert or report indicating that the subject URL is associated with the phishing attack.
2. The computerized method of claim 1, further comprising: receiving, by the PDAS, the subject URL; andobtaining, by the PDAS, the subject screenshot of the subject webpage corresponding to the subject URL.
3. The computerized method of claim 1, further comprising: performing, by the PDAS, a pre-filter check on the subject URL, the pre-filter check including one or more static analyses.
4. The computerized method of claim 1, further comprising: determining, by the PDAS, from the subject screenshot, one or more keypoint descriptors for each of the plurality of keypoints, each of the one or more keypoint descriptors including one or more parameters of a keypoint of the plurality of keypoints; andgenerating, by the PDAS, a feature vector including the plurality of keypoints and the one or more keypoint descriptors for each of the plurality of keypoints,wherein determining, by the PDAS, the set of confidences includes analyzing the feature vector accordingly to the machine learning model.
5. The computerized method of claim 1, wherein the image comparison includes: retrieving, by the PDAS, previously determined keypoints of a first screenshot of the one or more screenshots; andcorrelating, by the PDAS, the previously determined keypoints of the first screenshot with the plurality of keypoints of the subject screenshot.
6. The computerized method of claim 1, wherein the subject screenshot includes image data of the subject webpage that is configured to be displayable on a computer screen.
7. The computerized method of claim 1, wherein each keypoint of the plurality of keypoints includes an image region having a known orientation within a set of pixels representing the subject screenshot.
8. The computerized method of claim 1, wherein a first confidence corresponds to a likelihood that the subject webpage is mimicking a first screenshot of a first webpage of the first webpage family, the first confidence included within the set of confidences.
9. The computerized method of claim 1, wherein the machine learning model is generated and trained with a plurality of screenshots of each of a plurality of webpages using a combination of an internet browser and an operating system.
10. The computerized method of claim 1, wherein the performing of the image comparison includes: performing the image comparison between the subject screenshot and one or more screenshots corresponding to webpages of the first webpage family until a result of an image comparison is greater than or equal to the predefined threshold.
11. The computerized method of claim 1, wherein the responsive to the determining that the result of the image comparison exceeds the predefined threshold includes transmitting the alert or report to one or more of a user of an endpoint device or an analyst.
12. The computerized method of claim 1, wherein the executing of the machine learning model indicates that a plurality of webpage families are closely correlated with the subject screenshot and the performing the image comparison is between the subject screenshot and a screenshot of at least one webpage from each of the set of webpage families including the one or more screenshots corresponding to the webpages within the first webpage family.
13. The computerized method of claim 1, further comprising: responsive to the determining that the result of the image comparison is below the predefined threshold, indicating that the subject URL is not associated with the phishing attack.
14. A non-transitory computer-readable medium, when processed by one or more processors, analyzes a subject Uniform Resource Locator (URL) to determine whether the subject URL is associated with a phishing attack, the non-transitory computer readable medium comprising: a feature generation logic module that, when executed by the one or more processors, detects a plurality of keypoints within a subject screenshot of a subject webpage corresponding to the subject URL and generates a subject feature vector that includes the detected plurality of keypoints;a classifier logic module that, when executed by the one or more processors, executes a machine learning model using the detected the plurality of keypoints as input to the machine learning model to determine a set of confidences, wherein each confidence of the set of confidences reflects a potential relationship between the subject screenshot and a first webpage family within a set of webpage families, and wherein the first webpage family within the set of webpage families is associated with a highest confidence of the set of confidence, wherein the machine learning model is a representation of relationships between a set of training feature vectors representing a training set of URLs, each of the training feature vectors corresponds to a separate webpage family within the set of webpage families;an image comparator logic module that, when executed by the one or more processors, (i) performs an image comparison between the subject screenshot and one or more screenshots corresponding to webpages within the first webpage family, and (ii) determines whether a result of the image comparison exceeds a predefined threshold; anda reporting logic module that, when executed by the one or more processors, responsive to the image comparator logic module determining that the result of the image comparison exceeds the predefined threshold, generates an alert or report indicating that the subject URL is associated with the phishing attack.
15. The computer-readable medium of claim 14, wherein the subject screenshot is captured as an image file.
16. The computer-readable medium of claim 14, further comprising: a content fetcher logic module that, when executed by the one or more processors, obtains the subject screenshot by accessing a data caching system that stores one or more previously captured screenshots.
17. The computer-readable medium of claim 14, further comprising: a pre-filter logic module that, when executed by the one or more processors, performs static scanning the subject URL including an analysis of one or more of a blacklist or white list.
18. The computer-readable medium of claim 14, wherein the image comparison includes: retrieving previously determined keypoints of a first screenshot of a first webpage; andcorrelating the previously determined keypoints of the first screenshot with the detected keypoints of the subject screenshot.
19. The computer-readable medium of claim 14, wherein each keypoint of the detected keypoints includes an image region having a known orientation within a set of pixels representing the subject screenshot.
20. The computer-readable medium of claim 14, wherein the machine learning model includes a representation of a correlation of the set of training feature vectors, wherein a first feature vector corresponds to a first webpage corresponding to a first webpage family of a set of webpage families.
21. The computer-readable medium of claim 14, wherein the machine learning model is generated and trained with a plurality of screenshots of each of a plurality of webpages using a combination of an internet browser and an operating system.
22. The computer-readable medium of claim 14, wherein the performing of the image comparison includes: performing the image comparison between the subject screenshot and one or more screenshots corresponding to webpages of the first webpage family until a result of an image comparison is greater than or equal to the predefined threshold.
23. The computer-readable medium of claim 14, wherein the responsive to the determining that the result of the image comparison exceeds the predefined threshold includes transmitting the alert or report to one or more of a user of an endpoint device or an analyst.
24. The computer-readable medium of claim 14, wherein the executing of the machine learning model indicates that a plurality of webpage families are closely correlated with the subject screenshot and the performing the image comparison is between the subject screenshot and a screenshot of at least one webpage from each of the set of webpage families including the one or more screenshots corresponding to the webpages within the first webpage family.
25. The computer-readable medium of claim 14, further comprising: responsive to the determining that the result of the image comparison is below the predefined threshold, indicating that the subject URL is not associated with the phishing attack.
26. A non-transitory computer-readable medium, when processed by one or more processors, generates a machine learning model used in determining whether a subject Uniform Resource Locator (URL) is associated with a phishing attack, the non-transitory computer readable medium comprising: a feature generation logic module that, when executed by the one or more processors, for each screenshot corresponding to a URL within a set of training URLs detects keypoints within each screenshot and generates a feature vector for each screenshot that includes the detected keypoints of the corresponding screenshot;a domain mapper logic module that, when executed by the one or more processors, receives each feature vector generated by the feature generation logic module and labels each feature vector according to a webpage family of the screenshot to which the feature vector corresponds to generate a plurality of labeled feature vectors; anda training module logic that, when executed by the one or more processors, generates the machine learning model including a digitized representation of a correlation of the plurality of labeled feature vectors corresponding to the set of training URLs, wherein the machine learning model is a representation of relationships between a set of training feature vectors representing the set of training URLs, each of the training feature vectors corresponds to a separate webpage family within a set of webpage families, and wherein execution of the machine learning model using a plurality of keypoints of a subject screenshot as input determines a set of confidences, wherein each confidence within the set of confidences reflects a potential relationship between the subject screenshot and a webpage family within the set of webpage families, and wherein a first webpage family within the set of webpage families is associated with a highest confidence of the set of confidences; andan image comparison logic module that, when executed by the one or more processors, (i) performs an image comparison between the subject screenshot and one or more screenshots corresponding to webpages within the first webpage family, and (ii) determines whether a result of the image comparison exceeds a predefined threshold.
27. The computer-readable medium of claim 26, wherein the subject screenshot is captured as an image file.
28. The computer-readable medium of claim 26, wherein the feature generation logic module uses at least one computer vision technique to detect the keypoints within each screenshot.
29. The computer-readable medium of claim 26, wherein the machine learning model represents one or more hyperplanes onto which features of each feature vector are categorized.
30. The non-transitory computer-readable medium of claim 26, further comprising: a content fetcher logic module that, when executed by the one or more processors, obtains a screenshot of a web site corresponding to each URL within the set of training URLs, the set of training URLs including URLs determined as likely to be targeted for phishing attacks, wherein each screenshot is obtained using a different combination of an internet browser and an operating system.
31. The computer-readable medium of claim 30, wherein the content fetcher logic module obtains a first screenshot by accessing a data caching system that stores one or more previously captured screenshots.
32. The non-transitory computer-readable medium of claim 26, further comprising: a reporting logic module that, when executed by the one or more processors, responsive to the image comparator logic module determining that the result of the image comparison exceeds the predefined threshold, generates an alert or report indicating that the subject URL is associated with the phishing attack.

US Referenced Citations (721)

Number	Name	Date	Kind
4292580	Ott et al.	Sep 1981	A
5175732	Hendel et al.	Dec 1992	A
5319776	Hile et al.	Jun 1994	A
5440723	Arnold et al.	Aug 1995	A
5490249	Miller	Feb 1996	A
5657473	Killean et al.	Aug 1997	A
5802277	Cowlard	Sep 1998	A
5842002	Schnurer et al.	Nov 1998	A
5960170	Chen et al.	Sep 1999	A
5978917	Chi	Nov 1999	A
5983348	Ji	Nov 1999	A
6088803	Tso et al.	Jul 2000	A
6092194	Touboul	Jul 2000	A
6094677	Capek et al.	Jul 2000	A
6108799	Boulay et al.	Aug 2000	A
6154844	Touboul et al.	Nov 2000	A
6269330	Cidon et al.	Jul 2001	B1
6272641	Ji	Aug 2001	B1
6279113	Vaidya	Aug 2001	B1
6298445	Shostack et al.	Oct 2001	B1
6357008	Nachenberg	Mar 2002	B1
6424627	Sorhaug et al.	Jul 2002	B1
6442696	Wray et al.	Aug 2002	B1
6484315	Ziese	Nov 2002	B1
6487666	Shanklin et al.	Nov 2002	B1
6493756	O'Brien et al.	Dec 2002	B1
6550012	Villa et al.	Apr 2003	B1
6775657	Baker	Aug 2004	B1
6831893	Ben Nun et al.	Dec 2004	B1
6832367	Choi et al.	Dec 2004	B1
6895550	Kanchirayappa et al.	May 2005	B2
6898632	Gordy et al.	May 2005	B2
6907396	Muttik et al.	Jun 2005	B1
6941348	Petry et al.	Sep 2005	B2
6971097	Wallman	Nov 2005	B1
6981279	Arnold et al.	Dec 2005	B1
7007107	Ivchenko et al.	Feb 2006	B1
7028179	Anderson et al.	Apr 2006	B2
7043757	Hoefelmeyer et al.	May 2006	B2
7058822	Edery et al.	Jun 2006	B2
7069316	Gryaznov	Jun 2006	B1
7080407	Zhao et al.	Jul 2006	B1
7080408	Pak et al.	Jul 2006	B1
7093002	Wolff et al.	Aug 2006	B2
7093239	van der Made	Aug 2006	B1
7096498	Judge	Aug 2006	B2
7100201	Izatt	Aug 2006	B2
7107617	Hursey et al.	Sep 2006	B2
7159149	Spiegel et al.	Jan 2007	B2
7213260	Judge	May 2007	B2
7231667	Jordan	Jun 2007	B2
7240364	Branscomb et al.	Jul 2007	B1
7240368	Roesch et al.	Jul 2007	B1
7243371	Kasper et al.	Jul 2007	B1
7249175	Donaldson	Jul 2007	B1
7287278	Liang	Oct 2007	B2
7308716	Danford et al.	Dec 2007	B2
7328453	Merkle, Jr. et al.	Feb 2008	B2
7346486	Ivancic et al.	Mar 2008	B2
7356736	Natvig	Apr 2008	B2
7386888	Liang et al.	Jun 2008	B2
7392542	Bucher	Jun 2008	B2
7418729	Szor	Aug 2008	B2
7428300	Drew et al.	Sep 2008	B1
7441272	Durham et al.	Oct 2008	B2
7448084	Apap et al.	Nov 2008	B1
7458098	Judge et al.	Nov 2008	B2
7464404	Carpenter et al.	Dec 2008	B2
7464407	Nakae et al.	Dec 2008	B2
7467408	O'Toole, Jr.	Dec 2008	B1
7478428	Thomlinson	Jan 2009	B1
7480773	Reed	Jan 2009	B1
7487543	Arnold et al.	Feb 2009	B2
7496960	Chen et al.	Feb 2009	B1
7496961	Zimmer et al.	Feb 2009	B2
7519990	Xie	Apr 2009	B1
7523493	Liang et al.	Apr 2009	B2
7530104	Thrower et al.	May 2009	B1
7540025	Tzadikario	May 2009	B2
7546638	Anderson et al.	Jun 2009	B2
7565550	Liang et al.	Jul 2009	B2
7568233	Szor et al.	Jul 2009	B1
7584455	Ball	Sep 2009	B2
7603715	Costa et al.	Oct 2009	B2
7607171	Marsden et al.	Oct 2009	B1
7639714	Stolfo et al.	Dec 2009	B2
7644441	Schmid et al.	Jan 2010	B2
7657419	van der Made	Feb 2010	B2
7676841	Sobchuk et al.	Mar 2010	B2
7698548	Shelest et al.	Apr 2010	B2
7707633	Danford et al.	Apr 2010	B2
7712136	Sprosts et al.	May 2010	B2
7730011	Deninger et al.	Jun 2010	B1
7739740	Nachenberg et al.	Jun 2010	B1
7779463	Stolfo et al.	Aug 2010	B2
7784097	Stolfo et al.	Aug 2010	B1
7832008	Kraemer	Nov 2010	B1
7836502	Zhao et al.	Nov 2010	B1
7849506	Dansey et al.	Dec 2010	B1
7854007	Sprosts et al.	Dec 2010	B2
7869073	Oshima	Jan 2011	B2
7877803	Enstone et al.	Jan 2011	B2
7904959	Sidiroglou et al.	Mar 2011	B2
7908660	Bahl	Mar 2011	B2
7930738	Petersen	Apr 2011	B1
7937387	Frazier et al.	May 2011	B2
7937761	Bennett	May 2011	B1
7949849	Lowe et al.	May 2011	B2
7996556	Raghavan et al.	Aug 2011	B2
7996836	McCorkendale et al.	Aug 2011	B1
7996904	Chiueh et al.	Aug 2011	B1
7996905	Arnold et al.	Aug 2011	B2
8006305	Aziz	Aug 2011	B2
8010667	Zhang et al.	Aug 2011	B2
8020206	Hubbard et al.	Sep 2011	B2
8028338	Schneider et al.	Sep 2011	B1
8042184	Batenin	Oct 2011	B1
8045094	Teragawa	Oct 2011	B2
8045458	Alperovitch et al.	Oct 2011	B2
8069484	McMillan et al.	Nov 2011	B2
8087086	Lai et al.	Dec 2011	B1
8171553	Aziz et al.	May 2012	B2
8175387	Hsieh et al.	May 2012	B1
8176049	Deninger et al.	May 2012	B2
8176480	Spertus	May 2012	B1
8201246	Wu et al.	Jun 2012	B1
8204984	Aziz et al.	Jun 2012	B1
8214905	Doukhvalov et al.	Jul 2012	B1
8220055	Kennedy	Jul 2012	B1
8225288	Miller et al.	Jul 2012	B2
8225373	Kraemer	Jul 2012	B2
8233882	Rogel	Jul 2012	B2
8234640	Fitzgerald et al.	Jul 2012	B1
8234709	Viljoen et al.	Jul 2012	B2
8239944	Nachenberg et al.	Aug 2012	B1
8260914	Ranjan	Sep 2012	B1
8266091	Gubin et al.	Sep 2012	B1
8286251	Eker et al.	Oct 2012	B2
8291499	Aziz et al.	Oct 2012	B2
8307435	Mann et al.	Nov 2012	B1
8307443	Wang et al.	Nov 2012	B2
8312545	Tuvell et al.	Nov 2012	B2
8321936	Green et al.	Nov 2012	B1
8321941	Tuvell et al.	Nov 2012	B2
8332571	Edwards, Sr.	Dec 2012	B1
8365286	Poston	Jan 2013	B2
8365297	Parshin et al.	Jan 2013	B1
8370938	Daswani et al.	Feb 2013	B1
8370939	Zaitsev et al.	Feb 2013	B2
8375444	Aziz et al.	Feb 2013	B2
8381299	Stolfo et al.	Feb 2013	B2
8402529	Green et al.	Mar 2013	B1
8464340	Ahn et al.	Jun 2013	B2
8479174	Chiriac	Jul 2013	B2
8479276	Vaystikh et al.	Jul 2013	B1
8479291	Bodke	Jul 2013	B1
8510827	Leake et al.	Aug 2013	B1
8510828	Guo et al.	Aug 2013	B1
8510842	Amit et al.	Aug 2013	B2
8516478	Edwards et al.	Aug 2013	B1
8516590	Ranadive et al.	Aug 2013	B1
8516593	Aziz	Aug 2013	B2
8522348	Chen et al.	Aug 2013	B2
8528086	Aziz	Sep 2013	B1
8533824	Hutton et al.	Sep 2013	B2
8539582	Aziz et al.	Sep 2013	B1
8549638	Aziz	Oct 2013	B2
8555391	Demir et al.	Oct 2013	B1
8561177	Aziz et al.	Oct 2013	B1
8566476	Shifter et al.	Oct 2013	B2
8566946	Aziz et al.	Oct 2013	B1
8584094	Dadhia et al.	Nov 2013	B2
8584234	Sobel et al.	Nov 2013	B1
8584239	Aziz et al.	Nov 2013	B2
8595834	Xie et al.	Nov 2013	B2
8627476	Satish et al.	Jan 2014	B1
8635696	Aziz	Jan 2014	B1
8682054	Xue et al.	Mar 2014	B2
8682812	Ranjan	Mar 2014	B1
8689333	Aziz	Apr 2014	B2
8695096	Zhang	Apr 2014	B1
8713631	Pavlyushchik	Apr 2014	B1
8713681	Silberman et al.	Apr 2014	B2
8726392	McCorkendale et al.	May 2014	B1
8739280	Chess et al.	May 2014	B2
8776229	Aziz	Jul 2014	B1
8782792	Bodke	Jul 2014	B1
8789172	Stolfo et al.	Jul 2014	B2
8789178	Kejriwal et al.	Jul 2014	B2
8793278	Frazier et al.	Jul 2014	B2
8793787	Ismael et al.	Jul 2014	B2
8805947	Kuzkin et al.	Aug 2014	B1
8806647	Daswani et al.	Aug 2014	B1
8832829	Manni et al.	Sep 2014	B2
8850570	Ramzan	Sep 2014	B1
8850571	Staniford et al.	Sep 2014	B2
8881234	Narasimhan et al.	Nov 2014	B2
8881271	Butler, II	Nov 2014	B2
8881282	Aziz et al.	Nov 2014	B1
8898788	Aziz et al.	Nov 2014	B1
8935779	Manni et al.	Jan 2015	B2
8949257	Shifter et al.	Feb 2015	B2
8984638	Aziz et al.	Mar 2015	B1
8990939	Staniford et al.	Mar 2015	B2
8990944	Singh et al.	Mar 2015	B1
8997219	Staniford et al.	Mar 2015	B2
9009822	Ismael et al.	Apr 2015	B1
9009823	Ismael et al.	Apr 2015	B1
9027135	Aziz	May 2015	B1
9071638	Aziz et al.	Jun 2015	B1
9104867	Thioux et al.	Aug 2015	B1
9106630	Frazier et al.	Aug 2015	B2
9106694	Aziz et al.	Aug 2015	B2
9118715	Staniford et al.	Aug 2015	B2
9159035	Ismael et al.	Oct 2015	B1
9171160	Vincent et al.	Oct 2015	B2
9176843	Ismael et al.	Nov 2015	B1
9189627	Islam	Nov 2015	B1
9195829	Goradia et al.	Nov 2015	B1
9197664	Aziz et al.	Nov 2015	B1
9223972	Vincent et al.	Dec 2015	B1
9225740	Ismael et al.	Dec 2015	B1
9241010	Bennett et al.	Jan 2016	B1
9251343	Vincent et al.	Feb 2016	B1
9262635	Paithane et al.	Feb 2016	B2
9268936	Butler	Feb 2016	B2
9275229	LeMasters	Mar 2016	B2
9282109	Aziz et al.	Mar 2016	B1
9292686	Ismael et al.	Mar 2016	B2
9294501	Mesdaq et al.	Mar 2016	B2
9300686	Pidathala et al.	Mar 2016	B2
9306960	Aziz	Apr 2016	B1
9306974	Aziz et al.	Apr 2016	B1
9311479	Manni et al.	Apr 2016	B1
9355247	Thioux et al.	May 2016	B1
9356944	Aziz	May 2016	B1
9363280	Rivlin et al.	Jun 2016	B1
9367681	Ismael et al.	Jun 2016	B1
9398028	Karandikar et al.	Jul 2016	B1
9413781	Cunningham et al.	Aug 2016	B2
9426071	Caldejon et al.	Aug 2016	B1
9430646	Mushtaq et al.	Aug 2016	B1
9432389	Khalid et al.	Aug 2016	B1
9438613	Paithane et al.	Sep 2016	B1
9438622	Staniford et al.	Sep 2016	B1
9438623	Thioux et al.	Sep 2016	B1
9459901	Jung et al.	Oct 2016	B2
9467460	Otvagin et al.	Oct 2016	B1
9483644	Paithane et al.	Nov 2016	B1
9495180	Ismael	Nov 2016	B2
9497213	Thompson et al.	Nov 2016	B2
9507935	Ismael et al.	Nov 2016	B2
9516057	Aziz	Dec 2016	B2
9519782	Aziz et al.	Dec 2016	B2
9536091	Paithane et al.	Jan 2017	B2
9537972	Edwards et al.	Jan 2017	B1
9560059	Islam	Jan 2017	B1
9565202	Kindlund et al.	Feb 2017	B1
9591015	Amin et al.	Mar 2017	B1
9591020	Aziz	Mar 2017	B1
9594904	Jain et al.	Mar 2017	B1
9594905	Ismael et al.	Mar 2017	B1
9594912	Thioux et al.	Mar 2017	B1
9609007	Rivlin et al.	Mar 2017	B1
9626509	Khalid et al.	Apr 2017	B1
9628498	Aziz et al.	Apr 2017	B1
9628507	Haq et al.	Apr 2017	B2
9633134	Ross	Apr 2017	B2
9635039	Islam et al.	Apr 2017	B1
9641546	Manni et al.	May 2017	B1
9654485	Neumann	May 2017	B1
9661009	Karandikar et al.	May 2017	B1
9661018	Aziz	May 2017	B1
9674298	Edwards et al.	Jun 2017	B1
9680862	Ismael et al.	Jun 2017	B2
9690606	Ha et al.	Jun 2017	B1
9690933	Singh et al.	Jun 2017	B1
9690935	Shifter et al.	Jun 2017	B2
9690936	Malik et al.	Jun 2017	B1
9736179	Ismael	Aug 2017	B2
9740857	Ismael et al.	Aug 2017	B2
9747446	Pidathala et al.	Aug 2017	B1
9756074	Aziz et al.	Sep 2017	B2
9773112	Rathor et al.	Sep 2017	B1
9781144	Dtvagin et al.	Oct 2017	B1
9787700	Amin et al.	Oct 2017	B1
9787706	Otvagin et al.	Oct 2017	B1
9792196	Ismael et al.	Oct 2017	B1
9824209	Ismael et al.	Nov 2017	B1
9824211	Wilson	Nov 2017	B2
9824216	Khalid et al.	Nov 2017	B1
9825976	Gomez et al.	Nov 2017	B1
9825989	Mehra et al.	Nov 2017	B1
9838408	Karandikar et al.	Dec 2017	B1
9838411	Aziz	Dec 2017	B1
9838416	Aziz	Dec 2017	B1
9838417	Khalid et al.	Dec 2017	B1
9846776	Paithane et al.	Dec 2017	B1
9876701	Caldejon et al.	Jan 2018	B1
9888016	Amin et al.	Feb 2018	B1
9888019	Pidathala et al.	Feb 2018	B1
9910988	Vincent et al.	Mar 2018	B1
9912644	Cunningham	Mar 2018	B2
9912681	Ismael et al.	Mar 2018	B1
9912684	Aziz et al.	Mar 2018	B1
9912691	Mesdaq et al.	Mar 2018	B2
9912698	Thioux et al.	Mar 2018	B1
9916440	Paithane et al.	Mar 2018	B1
9921978	Chan et al.	Mar 2018	B1
9934376	Ismael	Apr 2018	B1
9934381	Kindlund et al.	Apr 2018	B1
9946568	Ismael et al.	Apr 2018	B1
9954890	Staniford et al.	Apr 2018	B1
9973531	Thioux	May 2018	B1
10002252	Ismael et al.	Jun 2018	B2
10019338	Goradia et al.	Jul 2018	B1
10019573	Silberman et al.	Jul 2018	B2
10025691	Ismael et al.	Jul 2018	B1
10025927	Khalid et al.	Jul 2018	B1
10027689	Rathor et al.	Jul 2018	B1
10027690	Aziz et al.	Jul 2018	B2
10027696	Rivlin et al.	Jul 2018	B1
10033747	Paithane et al.	Jul 2018	B1
10033748	Cunningham et al.	Jul 2018	B1
10033753	Islam et al.	Jul 2018	B1
10033759	Kabra et al.	Jul 2018	B1
10050998	Singh	Aug 2018	B1
10068091	Aziz et al.	Sep 2018	B1
10075455	Zafar et al.	Sep 2018	B2
10083302	Paithane et al.	Sep 2018	B1
10084813	Eyada	Sep 2018	B2
10089461	Ha et al.	Oct 2018	B1
10097573	Aziz	Oct 2018	B1
10104102	Neumann	Oct 2018	B1
10108446	Steinberg et al.	Oct 2018	B1
10121000	Rivlin et al.	Nov 2018	B1
10122746	Manni et al.	Nov 2018	B1
10133863	Bu et al.	Nov 2018	B2
10133866	Kumar et al.	Nov 2018	B1
10146810	Shiffer et al.	Dec 2018	B2
10148693	Singh et al.	Dec 2018	B2
10165000	Aziz et al.	Dec 2018	B1
10169585	Pilipenko et al.	Jan 2019	B1
10176321	Abbasi et al.	Jan 2019	B2
10181029	Ismael et al.	Jan 2019	B1
10191861	Steinberg et al.	Jan 2019	B1
10192052	Singh et al.	Jan 2019	B1
10198574	Thioux et al.	Feb 2019	B1
10200384	Mushtaq et al.	Feb 2019	B1
10210329	Malik et al.	Feb 2019	B1
10216927	Steinberg	Feb 2019	B1
10218740	Mesdaq et al.	Feb 2019	B1
10242185	Goradia	Mar 2019	B1
20010005889	Albrecht	Jun 2001	A1
20010047326	Broadbent et al.	Nov 2001	A1
20020018903	Kokubo et al.	Feb 2002	A1
20020038430	Edwards et al.	Mar 2002	A1
20020091819	Melchione et al.	Jul 2002	A1
20020095607	Lin-Hendel	Jul 2002	A1
20020116627	Tarbotton et al.	Aug 2002	A1
20020144156	Copeland	Oct 2002	A1
20020162015	Tang	Oct 2002	A1
20020166063	Lachman et al.	Nov 2002	A1
20020169952	DiSanto et al.	Nov 2002	A1
20020184528	Shevenell et al.	Dec 2002	A1
20020188887	Largman et al.	Dec 2002	A1
20020194490	Halperin et al.	Dec 2002	A1
20030021728	Shame et al.	Jan 2003	A1
20030074578	Ford et al.	Apr 2003	A1
20030084318	Schertz	May 2003	A1
20030101381	Mateev et al.	May 2003	A1
20030115483	Liang	Jun 2003	A1
20030188190	Aaron et al.	Oct 2003	A1
20030191957	Hypponen et al.	Oct 2003	A1
20030200460	Morota et al.	Oct 2003	A1
20030212902	van der Made	Nov 2003	A1
20030229801	Kouznetsov et al.	Dec 2003	A1
20030237000	Denton et al.	Dec 2003	A1
20040003323	Bennett et al.	Jan 2004	A1
20040006473	Mills et al.	Jan 2004	A1
20040015712	Szor	Jan 2004	A1
20040019832	Arnold et al.	Jan 2004	A1
20040047356	Bauer	Mar 2004	A1
20040083408	Spiegel et al.	Apr 2004	A1
20040088581	Brawn et al.	May 2004	A1
20040093513	Cantrell et al.	May 2004	A1
20040111531	Staniford et al.	Jun 2004	A1
20040117478	Triulzi et al.	Jun 2004	A1
20040117624	Brandt et al.	Jun 2004	A1
20040128355	Chao et al.	Jul 2004	A1
20040165588	Pandya	Aug 2004	A1
20040236963	Danford et al.	Nov 2004	A1
20040243349	Greifeneder et al.	Dec 2004	A1
20040249911	Alkhatib et al.	Dec 2004	A1
20040255161	Cavanaugh	Dec 2004	A1
20040268147	Wiederin et al.	Dec 2004	A1
20050005159	Oliphant	Jan 2005	A1
20050021740	Bar et al.	Jan 2005	A1
20050033960	Vialen et al.	Feb 2005	A1
20050033989	Poletto et al.	Feb 2005	A1
20050050148	Mohammadioun et al.	Mar 2005	A1
20050086523	Zimmer et al.	Apr 2005	A1
20050091513	Mitomo et al.	Apr 2005	A1
20050091533	Omote et al.	Apr 2005	A1
20050091652	Ross et al.	Apr 2005	A1
20050108562	Khazan et al.	May 2005	A1
20050114663	Cornell et al.	May 2005	A1
20050125195	Brendel	Jun 2005	A1
20050149726	Joshi et al.	Jul 2005	A1
20050157662	Bingham et al.	Jul 2005	A1
20050183143	Anderholm et al.	Aug 2005	A1
20050201297	Peikari	Sep 2005	A1
20050210533	Copeland et al.	Sep 2005	A1
20050238005	Chen et al.	Oct 2005	A1
20050240781	Gassoway	Oct 2005	A1
20050262562	Gassoway	Nov 2005	A1
20050265331	Stolfo	Dec 2005	A1
20050283839	Cowburn	Dec 2005	A1
20060010495	Cohen et al.	Jan 2006	A1
20060015416	Hoffman et al.	Jan 2006	A1
20060015715	Anderson	Jan 2006	A1
20060015747	Van de Ven	Jan 2006	A1
20060021029	Brickell et al.	Jan 2006	A1
20060021054	Costa et al.	Jan 2006	A1
20060031476	Mathes et al.	Feb 2006	A1
20060047665	Neil	Mar 2006	A1
20060070130	Costea et al.	Mar 2006	A1
20060075496	Carpenter et al.	Apr 2006	A1
20060095968	Portolani et al.	May 2006	A1
20060101516	Sudaharan et al.	May 2006	A1
20060101517	Banzhof et al.	May 2006	A1
20060117385	Mester et al.	Jun 2006	A1
20060123477	Raghavan et al.	Jun 2006	A1
20060143709	Brooks et al.	Jun 2006	A1
20060150249	Gassen et al.	Jul 2006	A1
20060161983	Cothrell et al.	Jul 2006	A1
20060161987	Levy-Yurista	Jul 2006	A1
20060161989	Reshef et al.	Jul 2006	A1
20060164199	Glide et al.	Jul 2006	A1
20060173992	Weber et al.	Aug 2006	A1
20060179147	Tran et al.	Aug 2006	A1
20060184632	Marino et al.	Aug 2006	A1
20060191010	Benjamin	Aug 2006	A1
20060221956	Narayan et al.	Oct 2006	A1
20060236393	Kramer et al.	Oct 2006	A1
20060242709	Seinfeld et al.	Oct 2006	A1
20060248519	Jaeger et al.	Nov 2006	A1
20060248582	Panjwani et al.	Nov 2006	A1
20060251104	Koga	Nov 2006	A1
20060288417	Bookbinder et al.	Dec 2006	A1
20070006288	Mayfield et al.	Jan 2007	A1
20070006313	Porras et al.	Jan 2007	A1
20070011174	Takaragi et al.	Jan 2007	A1
20070016951	Piccard et al.	Jan 2007	A1
20070019286	Kikuchi	Jan 2007	A1
20070033645	Jones	Feb 2007	A1
20070038943	FitzGerald et al.	Feb 2007	A1
20070064689	Shin et al.	Mar 2007	A1
20070074169	Chess et al.	Mar 2007	A1
20070094730	Bhikkaji et al.	Apr 2007	A1
20070101435	Konanka et al.	May 2007	A1
20070128855	Cho et al.	Jun 2007	A1
20070142030	Sinha et al.	Jun 2007	A1
20070143827	Nicodemus et al.	Jun 2007	A1
20070156895	Vuong	Jul 2007	A1
20070157180	Tillmann et al.	Jul 2007	A1
20070157306	Elrod et al.	Jul 2007	A1
20070168988	Eisner et al.	Jul 2007	A1
20070171824	Ruello et al.	Jul 2007	A1
20070174915	Gribble et al.	Jul 2007	A1
20070192500	Lum	Aug 2007	A1
20070192858	Lum	Aug 2007	A1
20070198275	Malden et al.	Aug 2007	A1
20070208822	Wang et al.	Sep 2007	A1
20070220607	Sprosts et al.	Sep 2007	A1
20070240218	Tuvell et al.	Oct 2007	A1
20070240219	Tuvell et al.	Oct 2007	A1
20070240220	Tuvell et al.	Oct 2007	A1
20070240222	Tuvell et al.	Oct 2007	A1
20070250930	Aziz et al.	Oct 2007	A1
20070256132	Oliphant	Nov 2007	A2
20070271446	Nakamura	Nov 2007	A1
20080005782	Aziz	Jan 2008	A1
20080018122	Zierler et al.	Jan 2008	A1
20080028463	Dagon et al.	Jan 2008	A1
20080040710	Chiriac	Feb 2008	A1
20080046781	Childs et al.	Feb 2008	A1
20080066179	Liu	Mar 2008	A1
20080072326	Danford et al.	Mar 2008	A1
20080077793	Tan et al.	Mar 2008	A1
20080080518	Hoeflin et al.	Apr 2008	A1
20080086720	Lekel	Apr 2008	A1
20080098476	Syversen	Apr 2008	A1
20080120722	Sima et al.	May 2008	A1
20080134178	Fitzgerald et al.	Jun 2008	A1
20080134334	Kim et al.	Jun 2008	A1
20080141376	Clausen et al.	Jun 2008	A1
20080162449	Chao-Yu	Jul 2008	A1
20080184367	McMillan et al.	Jul 2008	A1
20080184373	Traut et al.	Jul 2008	A1
20080189787	Arnold et al.	Aug 2008	A1
20080201778	Guo et al.	Aug 2008	A1
20080209557	Herley et al.	Aug 2008	A1
20080215742	Goldszmidt et al.	Sep 2008	A1
20080222729	Chen et al.	Sep 2008	A1
20080263665	Ma et al.	Oct 2008	A1
20080295172	Bohacek	Nov 2008	A1
20080301810	Lehane et al.	Dec 2008	A1
20080307524	Singh et al.	Dec 2008	A1
20080313738	Enderby	Dec 2008	A1
20080320594	Jiang	Dec 2008	A1
20090003317	Kasralikar et al.	Jan 2009	A1
20090006361	Abuelsaad	Jan 2009	A1
20090007100	Field et al.	Jan 2009	A1
20090013408	Schipka	Jan 2009	A1
20090031423	Liu et al.	Jan 2009	A1
20090036111	Danford et al.	Feb 2009	A1
20090037835	Goldman	Feb 2009	A1
20090044024	Oberheide et al.	Feb 2009	A1
20090044274	Budko et al.	Feb 2009	A1
20090064332	Porras et al.	Mar 2009	A1
20090077666	Chen et al.	Mar 2009	A1
20090083369	Marmor	Mar 2009	A1
20090083855	Apap et al.	Mar 2009	A1
20090089879	Wang et al.	Apr 2009	A1
20090094697	Provos et al.	Apr 2009	A1
20090113425	Ports et al.	Apr 2009	A1
20090125976	Wassermann et al.	May 2009	A1
20090126015	Monastyrsky et al.	May 2009	A1
20090126016	Sobko et al.	May 2009	A1
20090133125	Choi et al.	May 2009	A1
20090144823	Lamastra et al.	Jun 2009	A1
20090158430	Borders	Jun 2009	A1
20090172815	Gu et al.	Jul 2009	A1
20090187992	Poston	Jul 2009	A1
20090193293	Stolfo et al.	Jul 2009	A1
20090198651	Shifter et al.	Aug 2009	A1
20090198670	Shifter et al.	Aug 2009	A1
20090198689	Frazier et al.	Aug 2009	A1
20090199274	Frazier et al.	Aug 2009	A1
20090199296	Xie et al.	Aug 2009	A1
20090228233	Anderson et al.	Sep 2009	A1
20090241187	Troyansky	Sep 2009	A1
20090241190	Todd et al.	Sep 2009	A1
20090265692	Godefroid et al.	Oct 2009	A1
20090271867	Zhang	Oct 2009	A1
20090300415	Zhang et al.	Dec 2009	A1
20090300761	Park et al.	Dec 2009	A1
20090328185	Berg et al.	Dec 2009	A1
20090328221	Blumfield et al.	Dec 2009	A1
20100005146	Drako et al.	Jan 2010	A1
20100011205	McKenna	Jan 2010	A1
20100017546	Poo et al.	Jan 2010	A1
20100030996	Butler, II	Feb 2010	A1
20100031353	Thomas et al.	Feb 2010	A1
20100037314	Perdisci et al.	Feb 2010	A1
20100043073	Kuwamura	Feb 2010	A1
20100054278	Stolfo et al.	Mar 2010	A1
20100058474	Hicks	Mar 2010	A1
20100064044	Nonoyama	Mar 2010	A1
20100077481	Polyakov et al.	Mar 2010	A1
20100083376	Pereira et al.	Apr 2010	A1
20100115621	Staniford et al.	May 2010	A1
20100132038	Zaitsev	May 2010	A1
20100154056	Smith et al.	Jun 2010	A1
20100180344	Malyshev et al.	Jul 2010	A1
20100192223	Ismael et al.	Jul 2010	A1
20100220863	Dupaquis et al.	Sep 2010	A1
20100235831	Dittmer	Sep 2010	A1
20100251104	Massand	Sep 2010	A1
20100281102	Chinta et al.	Nov 2010	A1
20100281541	Stolfo et al.	Nov 2010	A1
20100281542	Stolfo et al.	Nov 2010	A1
20100287260	Peterson et al.	Nov 2010	A1
20100299754	Amit et al.	Nov 2010	A1
20100306173	Frank	Dec 2010	A1
20110004737	Greenebaum	Jan 2011	A1
20110025504	Lyon et al.	Feb 2011	A1
20110041179	St Hlberg	Feb 2011	A1
20110047594	Mahaffey et al.	Feb 2011	A1
20110047620	Mahaffey et al.	Feb 2011	A1
20110055907	Narasimhan et al.	Mar 2011	A1
20110078794	Manni et al.	Mar 2011	A1
20110093951	Aziz	Apr 2011	A1
20110099620	Stavrou et al.	Apr 2011	A1
20110099633	Aziz	Apr 2011	A1
20110099635	Silberman et al.	Apr 2011	A1
20110113231	Kaminsky	May 2011	A1
20110145918	Jung et al.	Jun 2011	A1
20110145920	Mahaffey et al.	Jun 2011	A1
20110145934	Abramovici et al.	Jun 2011	A1
20110167493	Song et al.	Jul 2011	A1
20110167494	Bowen et al.	Jul 2011	A1
20110173213	Frazier et al.	Jul 2011	A1
20110173460	Ito et al.	Jul 2011	A1
20110219449	St. Neitzel et al.	Sep 2011	A1
20110219450	McDougal et al.	Sep 2011	A1
20110225624	Sawhney et al.	Sep 2011	A1
20110225655	Niemela et al.	Sep 2011	A1
20110247072	Staniford et al.	Oct 2011	A1
20110265182	Peinado et al.	Oct 2011	A1
20110289582	Kejriwal et al.	Nov 2011	A1
20110302587	Nishikawa et al.	Dec 2011	A1
20110307954	Melnik et al.	Dec 2011	A1
20110307955	Kaplan et al.	Dec 2011	A1
20110307956	Yermakov et al.	Dec 2011	A1
20110314546	Aziz et al.	Dec 2011	A1
20120023593	Puder et al.	Jan 2012	A1
20120054869	Yen et al.	Mar 2012	A1
20120066698	Yanoo	Mar 2012	A1
20120079596	Thomas et al.	Mar 2012	A1
20120084859	Radinsky et al.	Apr 2012	A1
20120096553	Srivastava et al.	Apr 2012	A1
20120110667	Zubrilin et al.	May 2012	A1
20120117239	Holloway	May 2012	A1
20120117652	Manni et al.	May 2012	A1
20120121154	Xue et al.	May 2012	A1
20120124426	Maybee et al.	May 2012	A1
20120158626	Zhu	Jun 2012	A1
20120159620	Seifert	Jun 2012	A1
20120174186	Aziz et al.	Jul 2012	A1
20120174196	Bhogavilli et al.	Jul 2012	A1
20120174218	McCoy et al.	Jul 2012	A1
20120198279	Schroeder	Aug 2012	A1
20120210423	Friedrichs et al.	Aug 2012	A1
20120222121	Staniford et al.	Aug 2012	A1
20120255015	Sahita et al.	Oct 2012	A1
20120255017	Sallam	Oct 2012	A1
20120260342	Dube et al.	Oct 2012	A1
20120266244	Green et al.	Oct 2012	A1
20120278886	Luna	Nov 2012	A1
20120297489	Dequevy	Nov 2012	A1
20120330801	McDougal et al.	Dec 2012	A1
20120331553	Aziz et al.	Dec 2012	A1
20130014259	Gable et al.	Jan 2013	A1
20130036472	Aziz	Feb 2013	A1
20130047257	Aziz	Feb 2013	A1
20130074185	McDougal et al.	Mar 2013	A1
20130086684	Mohler	Apr 2013	A1
20130097699	Balupari et al.	Apr 2013	A1
20130097700	Chen et al.	Apr 2013	A1
20130097706	Titonis et al.	Apr 2013	A1
20130111587	Goel et al.	May 2013	A1
20130117852	Stute	May 2013	A1
20130117855	Kim et al.	May 2013	A1
20130139264	Brinkley et al.	May 2013	A1
20130160125	Likhachev et al.	Jun 2013	A1
20130160127	Jeong et al.	Jun 2013	A1
20130160130	Mendelev et al.	Jun 2013	A1
20130160131	Madou et al.	Jun 2013	A1
20130167236	Sick	Jun 2013	A1
20130174214	Duncan	Jul 2013	A1
20130185789	Hagiwara et al.	Jul 2013	A1
20130185795	Winn et al.	Jul 2013	A1
20130185798	Saunders et al.	Jul 2013	A1
20130191915	Antonakakis et al.	Jul 2013	A1
20130196649	Paddon et al.	Aug 2013	A1
20130227691	Aziz et al.	Aug 2013	A1
20130246370	Bartram et al.	Sep 2013	A1
20130247186	Lemasters	Sep 2013	A1
20130263260	Mahaffey et al.	Oct 2013	A1
20130291109	Staniford et al.	Oct 2013	A1
20130298243	Kumar et al.	Nov 2013	A1
20130318038	Shiffer et al.	Nov 2013	A1
20130318073	Shiffer et al.	Nov 2013	A1
20130325791	Shifter et al.	Dec 2013	A1
20130325792	Shifter et al.	Dec 2013	A1
20130325871	Shifter et al.	Dec 2013	A1
20130325872	Shifter et al.	Dec 2013	A1
20140032875	Butler	Jan 2014	A1
20140033307	Schmidtler	Jan 2014	A1
20140053260	Gupta et al.	Feb 2014	A1
20140053261	Gupta et al.	Feb 2014	A1
20140130158	Wang et al.	May 2014	A1
20140137180	Lukacs et al.	May 2014	A1
20140169762	Ryu	Jun 2014	A1
20140179360	Jackson et al.	Jun 2014	A1
20140181131	Ross	Jun 2014	A1
20140189687	Jung et al.	Jul 2014	A1
20140189866	Shifter et al.	Jul 2014	A1
20140189882	Jung et al.	Jul 2014	A1
20140237600	Silberman et al.	Aug 2014	A1
20140280245	Wilson	Sep 2014	A1
20140283037	Sikorski et al.	Sep 2014	A1
20140283063	Thompson et al.	Sep 2014	A1
20140328204	Klotsche et al.	Nov 2014	A1
20140337836	Ismael	Nov 2014	A1
20140344926	Cunningham et al.	Nov 2014	A1
20140351935	Shao et al.	Nov 2014	A1
20140380473	Bu et al.	Dec 2014	A1
20140380474	Paithane et al.	Dec 2014	A1
20150007312	Pidathala et al.	Jan 2015	A1
20150033341	Schmidtler et al.	Jan 2015	A1
20150096022	Vincent et al.	Apr 2015	A1
20150096023	Mesdaq et al.	Apr 2015	A1
20150096024	Haq et al.	Apr 2015	A1
20150096025	Ismael	Apr 2015	A1
20150180886	Staniford et al.	Jun 2015	A1
20150186645	Aziz et al.	Jul 2015	A1
20150199513	Ismael et al.	Jul 2015	A1
20150199531	Ismael et al.	Jul 2015	A1
20150199532	Ismael et al.	Jul 2015	A1
20150220735	Paithane et al.	Aug 2015	A1
20150372980	Eyada	Dec 2015	A1
20160004869	Ismael et al.	Jan 2016	A1
20160006756	Ismael et al.	Jan 2016	A1
20160044000	Cunningham	Feb 2016	A1
20160127393	Aziz et al.	May 2016	A1
20160191547	Zafar et al.	Jun 2016	A1
20160191550	Ismael et al.	Jun 2016	A1
20160261612	Mesdaq et al.	Sep 2016	A1
20160285914	Singh et al.	Sep 2016	A1
20160301703	Aziz	Oct 2016	A1
20160335110	Paithane et al.	Nov 2016	A1
20160352772	O'Connor	Dec 2016	A1
20170083703	Abbasi et al.	Mar 2017	A1
20180013770	Ismael	Jan 2018	A1
20180048660	Paithane et al.	Feb 2018	A1
20180115584	Alhumaisan	Apr 2018	A1
20180121316	Ismael et al.	May 2018	A1
20180288077	Siddiqui et al.	Oct 2018	A1

Foreign Referenced Citations (13)

Number	Date	Country
105763543	Jul 2016	CN
2439806	Jan 2008	GB
2490431	Oct 2012	GB
0206928	Jan 2002	WO
0223805	Mar 2002	WO
2007117636	Oct 2007	WO
2008041950	Apr 2008	WO
2011084431	Jul 2011	WO
2011112348	Sep 2011	WO
2012075336	Jun 2012	WO
2012145066	Oct 2012	WO
2013067505	May 2013	WO
2014208937	Dec 2014	WO

Non-Patent Literature Citations (63)

Entry
Kazemian et al, Comparisons of machine learning techniques for detecting malicious webpages, 2014. (Year: 2014).
“Mining Specification of Malicious Behavior”—Jha et al, UCSB, Sep. 2007 https://www.cs.ucsb.edu/.about.chris/research/doc/esec07.sub.--mining.pdf-.
“Network Security: NetDetector—Network Intrusion Forensic System (NIFS) Whitepaper”, (“NetDetector Whitepaper”), (2003).
“When Virtual is Better Than Real”, IEEEXplore Digital Library, available at, http://ieeexplore.ieee.org/xpl/articleDetails.isp?reload=true&arnumbe- r=990073, (Dec. 7, 2013).
Abdullah, et al., Visualizing Network Data for Intrusion Detection, 2005 IEEE Workshop on Information Assurance and Security, pp. 100-108.
Adetoye, Adedayo , et al., “Network Intrusion Detection & Response System”, (“Adetoye”), (Sep. 2003).
Apostolopoulos, George; hassapis, Constantinos; “V-eM: A cluster of Virtual Machines for Robust, Detailed, and High-Performance Network Emulation”, 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Sep. 11-14, 2006, pp. 117-126.
Aura, Tuomas, “Scanning electronic documents for personally identifiable information”, Proceedings of the 5th ACM workshop on Privacy in electronic society. ACM, 2006.
Baecher, “The Nepenthes Platform: An Efficient Approach to collect Malware”, Springer-verlag Berlin Heidelberg, (2006), pp. 165-184.
Bayer, et al., “Dynamic Analysis of Malicious Code”, J Comput Virol, Springer-Verlag, France., (2006), pp. 67-77.
Boubalos, Chris , “extracting syslog data out of raw pcap dumps, seclists.org, Honeypots mailing list archives”, available at http://seclists.org/honeypots/2003/q2/319 (“Boubalos”), (Jun. 5, 2003).
Chaudet, C. , et al., “Optimal Positioning of Active and Passive Monitoring Devices”, International Conference on Emerging Networking Experiments and Technologies, Proceedings of the 2005 ACM Conference on Emerging Network Experiment and Technology, CoNEXT '05, Toulousse, France, (Oct. 2005), pp. 71-82.
Chen, P. M. and Noble, B. D., “When Virtual is Better Than Real, Department of Electrical Engineering and Computer Science”, University of Michigan (“Chen”) (2001).
Cisco “Intrusion Prevention for the Cisco ASA 5500-x Series” Data Sheet (2012).
Cohen, M.I. , “PyFlag—an advanced network forensic framework”, Digital investigation 5, Elsevier, (2008), pp. S112-S120.
Costa, M. , et al., “Vigilante: End-to-End Containment of Internet Worms”, SOSP '05, Association for Computing Machinery, Inc., Brighton U.K., (Oct. 23-26, 2005).
Didier Stevens, “Malicious PDF Documents Explained”, Security & Privacy, IEEE, IEEE Service Center, Los Alamitos, CA, US, vol. 9, No. 1, Jan. 1, 2011, pp. 80-82, XP011329453, ISSN: 1540-7993, DOI: 10.1109/MSP.2011.14.
Distler, “Malware Analysis: An Introduction”, SANS Institute InfoSec Reading Room, SANS Institute, (2007).
Dunlap, George W. , et al., “ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay”, Proceeding of the 5th Symposium on Operating Systems Design and Implementation, USENIX Association, (“Dunlap”), (Dec. 9, 2002).
FireEye Malware Analysis & Exchange Network, Malware Protection System, FireEye Inc., 2010.
FireEye Malware Analysis, Modern Malware Forensics, FireEye Inc., 2010.
FireEye v.6.0 Security Target, pp. 1-35, Version 1.1, FireEye Inc., May 2011.
Goel, et al., Reconstructing System State for Intrusion Analysis, Apr. 2008 SIGOPS Operating Systems Review, vol. 42 Issue 3, pp. 21-28.
Gregg Keizer: “Microsoft's HoneyMonkeys Show Patching Windows Works”, Aug. 8, 2005, XP055143386, Retrieved from the Internet: URL:http://www.informationweek.com/microsofts-honeymonkeys-show-patching-windows-works/d/d-id/1035069? [retrieved on Jun. 1, 2016].
Heng Yin et al, Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis, Research Showcase @ CMU, Carnegie Mellon University, 2007.
Hiroshi Shinotsuka, Malware Authors Using New Techniques to Evade Automated Threat Analysis Systems, Oct. 26, 2012, http://www.symantec.com/connect/blogs/, pp. 1-4.
Idika et al., A-Survey-of-Malware-Detection-Techniques, Feb. 2, 2007, Department of Computer Science, Purdue University.
Isohara, Takamasa, Keisuke Takemori, and Ayumu Kubota. “Kernel-based behavior analysis for android malware detection.” Computational intelligence and Security (CIS), 2011 Seventh International Conference on. IEEE, 2011.
Kaeo, Merike , “Designing Network Security”, (“Kaeo”), (Nov. 2003).
Kevin A Roundy et al: “Hybrid Analysis and Control of Malware”, Sep. 15, 2010, Recent Advances in Intrusion Detection, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 317-338, XP019150454 ISBN:978-3-642-15511-6.
Khaled Salah et al: “Using Cloud Computing to Implement a Security Overlay Network”, Security & Privacy, IEEE, IEEE Service Center, Los Alamitos, CA, US, vol. 11, No. 1, Jan. 1, 2013 (Jan. 1, 2013).
Kim, H. , et al., “Autograph: Toward Automated, Distributed Worm Signature Detection”, Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, (Aug. 2004), pp. 271-286.
King, Samuel T., et al., “Operating System Support for Virtual Machines”, (“King”), (2003).
Kreibich, C. , et al., “Honeycomb-Creating Intrusion Detection Signatures Using Honeypots”, 2nd Workshop on Hot Topics in Networks (HotNets-11), Boston, USA, (2003).
Kristoff, J. , “Botnets, Detection and Mitigation: DNS-Based Techniques”, NU Security Day, (2005), 23 pages.
Lastline Labs, The Threat of Evasive Malware, Feb. 25, 2013, Lastline Labs, pp. 1-8.
Li et al., A VMM-Based System Call Interposition Framework for Program Monitoring, Dec. 2010, IEEE 16th International Conference on Parallel and Distributed Systems, pp. 706-711.
Lindorfer, Martina, Clemens Kolbitsch, and Paolo Milani Comparetti. “Detecting environment-sensitive malware.” Recent Advances in Intrusion Detection. Springer Berlin Heidelberg, 2011.
Marchette, David J., “Computer Intrusion Detection and Network Monitoring: A Statistical Viewpoint”, (“Marchette”), (2001).
Moore, D. , et al., “Internet Quarantine: Requirements for Containing Self-Propagating Code”, Infocom, vol. 3, (Mar. 30-Apr. 3, 2003), pp. 1901-1910.
Morales, Jose A., et al., ““Analyzing and exploiting network behaviors of malware.””, Security and Privacy in Communication Networks. Springer Berlin Heidelberg, 2010. 20-34.
Mori, Detecting Unknown Computer Viruses, 2004, Springer-Verlag Berlin Heidelberg.
Natvig, Kurt , “Sandboxii: Internet”, Virus Bulletin Conference, (“Natvig”), (Sep. 2002).
NetBIOS Working Group. Protocol Standard for a NetBIOS Service on a TCP/UDP transport: Concepts and Methods. STD 19, RFC 1001, Mar. 1987.
Newsome, J. , et al., “Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software”, in Proceedings of the 12th Annual Network and Distributed System Security, Symposium (NDSS '05), (Feb. 2005).
Nojiri, D. , et al., “Cooperation Response Strategies for Large Scale Attack Mitigation”, DARPA Information Survivability Conference and Exposition, vol. 1, (Apr. 22-24, 2003), pp. 293-302.
Oberheide et al., CloudAV.sub.—N-Version Antivirus in the Network Cloud, 17th USENIX Security Symposium USENIX Security '08 Jul. 28-Aug. 1, 2008 San Jose, CA.
Reiner Sailer, Enriquillo Valdez, Trent Jaeger, Roonald Perez, Leendert van Doom, John Linwood Griffin, Stefan Berger., sHype: Secure Hypervisor Appraoch to Trusted Virtualized Systems (Feb. 2, 2005) (“Sailer”).
Silicon Defense, “Worm Containment in the Internal Network”, (Mar. 2003), pp. 1-25.
Singh, S. , et al., “Automated Worm Fingerprinting”, Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation, San Francisco, California, (Dec. 2004).
Thomas H. Ptacek, and Timothy N. Newsham , “Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection”, Secure Networks, (“Ptacek”), (Jan. 1998).
Haruta Shuichiro et al: “Visual Similarity-Based Phishing Detection Scheme Using Image and CSS With Target Website Finder.” Globecom 2017—2017 Ieee Global Communications Conference, Ieee, Dec. 4, 2017.
Kazemian H B et al: “Comparisons of Machine Learning Techniques for Detecting Malicious Webpages”, Expert Systems With Applications, Oxford, Gb, vol. 42. No. 3, Sep. 16, 2014.
Max-Emanuel Maurer et al: “Using Visual Website Similarity for Phishing Detection and Reporting”. Proceedings of the 2012 Acm Annual Conference Extended Abstracts on Human Factors in Computing Systems Extended Abstracts, Chi Ea '12, May 5, 2012.
PCT/US2018/053561 filed Sep. 28, 2018 International Search Report and Written Opinion dated Jan. 18, 2019.
Venezia, Paul , “NetDetector Captures Intrusions”, InfoWorld Issue 27, (“Venezia”), (Jul. 14, 2003).
Vladimir Getov: “Security as a Service in Smart Clouds—Opportunities and Concerns”, Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual, IEEE, Jul. 16, 2012 (Jul. 16, 2012).
Wahid et al., Characterising the Evolution in Scanning Activity of Suspicious Hosts, Oct. 2009, Third International Conference on Network and System Security, pp. 344-350.
Whyte, et al., “DNS-Based Detection of Scanning Works in an Enterprise Network”, Proceedings of the 12th Annual Network and Distributed System Security Symposium, (Feb. 2005), 15 pages.
Williamson, Matthew M., “Throttling Viruses: Restricting Propagation to Defeat Malicious Mobile Code”, ACSAC Conference, Las Vegas, NV, USA, (Dec. 2002), pp. 1-9.
Yuhei Kawakoya et al: “Memory behavior-based automatic malware unpacking in stealth debugging environment”, Malicious and Unwanted Software (Malware), 2010 5th International Conference on, IEEE, Piscataway, NJ, USA, Oct. 19, 2010, pp. 39-46, XP031833827, ISBN:978-1-4244-8-9353-1.
Zhang et al., The Effects of Threading, Infection Time, and Multiple-Attacker Collaboration on Malware Propagation, Sep. 2009, IEEE 28th International Symposium on Reliable Distributed Systems, pp. 73-82.
Afroz, S. et al. “PhishZoo: Detecting Phishing Websites by Looking at Them”, 2011 IEEE Fifth International Conference on Semantic Computing. DOI: 10.1109/ICSC.2011.52. Sep. 2011.

Related Publications (1)

	Number	Date	Country
	20190104154 A1	Apr 2019	US

Phishing attack detection

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications