Method and system for detecting malware

Information

  • Patent Grant
  • 9525699
  • Patent Number
    9,525,699
  • Date Filed
    Monday, September 30, 2013
    11 years ago
  • Date Issued
    Tuesday, December 20, 2016
    8 years ago
Abstract
A system and method of analysis. NX domain names are collected from an asset in a real network. The NX domain names are domain names that are not registered. The real network NX domain names are utilized to create testing vectors. The testing vectors are classified as benign vectors or malicious vectors based on training vectors. The asset is then classified as infected if the NX testing vector created from the real network NX domain names is classified as a malicious vector.
Description
BRIEF DESCRIPTION OF THE FIGURES


FIG. 1 illustrates a system for detecting malware, according to one embodiment.



FIGS. 2-4 illustrate a method for detecting malware, according to one embodiment.



FIG. 5 illustrates various elements involved in domain name resolution.



FIGS. 6-10 illustrate examples for detecting malware, according to several embodiments.







DESCRIPTION OF EMBODIMENTS OF THE INVENTION


FIG. 1 illustrates a system for detecting malware, according to one embodiment. FIG. 1 illustrates at least one network 101 (e.g., the Internet) connecting at least one NX application 105 (described below) on at least one server 120 to at least one honeypot 110 and at least one entity's network 125 (e.g., a private network of a company). The NX application 105 can determine if one or more assets 115 (e.g., computers) on the at least one entity's network 125 is infected with malware. It should be noted that the asset can be a simple asset (e.g., mainframe hardware, storage) or a complex asset (e.g., licensed software).


The determination of whether an asset is infected can comprise: collecting NX domain names from at least one honeypot and at least one asset; using the honeypot NX domain names to create training vectors; using the real network NX domain names to create testing vectors; classifying the testing vectors as benign vectors or malicious vectors; and classifying the at least one asset in the at least one real network as infected if the NX testing vector created from the real network NX domain names is classified as a malicious vector. (It should be noted that the testing vectors can be classified using: simple internal assets infected with known malware; simple internal assets infected with unknown malware; or complex internal network assets; or any combination thereof.)


NX domain name information is useful because some malware takes advantage of existing domain name system (DNS) services such as free domain testing (e.g., determining whether a new domain name is available). Such malware can use a domain name generator that employs a seed, such as the date, together with an algorithm to generate a set of domain names. The command and control (C&C) can try to register the generated domain names until a registrable subset of domain lames has been identified. An infected computer can then use those daily-generated set of domain names in order to establish a new communication channel with the C&C. The victim computers will employ the same seed (i.e. date) and algorithm to generate the same set of domain names. The victim computers will then use the generated domain names in attempts to contact the C&C computer. Eventually, each victim computer will find a domain name that was registered for the C&C computer to enable daily communication between the C&C computer and the victim computers. By changing the domain name for the C&C computer (e.g., daily), it becomes difficult to statically black list the domain names or the IP addresses of the C&C computer(s).


Thus, malware which uses the above domain name resolution to establish communication with a C&C can produce many NX-Domains (NXs), which can be domain names that have not been registered with an authoritative DNS and can be observable at a recursive DNS server (“RDNS”). RDNS servers map domain names to IP addresses, also called “resolving DNS queries”. If such a mapping between a domain name and an IP address doesn't exist, the RNDS can send back to the initiator of the DNS query a “Non-Existence” response. The Non-Existence response can indicate that the domain name does not have an IP address, and is thus an NX-Domain (NX). Monitoring the NXs observable at a RDNS can provide the ability to collect all possible NXs generated from all computers connected to the RDNS.



FIG. 2 illustrates a method for creating training vectors, according to one embodiment. Referring to FIG. 2, in 205, malware NXs can be collected from at least one honeypot (e.g., an Internet-attached server that acts as a decoy, luring in potential hackers in order to study their activities and monitor how they are able to break into a system) by an NX application 105 and grouped into sets of for example, 10.


The malware NXs can be collected so that a classifier can be trained in a controlled environment to recognize different categories of infected computers. For example, FIG. 5 illustrates a honeypot network configuration. In this example, the virtual machine names “kritis” operates as an internal, virtual gateway for the virtual machines dns01, dns02 and dns03, which are infected with malware (e.g., sinowal worm, bobax worm). By monitoring the DNS traffic that originates from infected virtual machines dns01, dns02, and dns03, a pure seed of malware domain names can be obtained.


In FIG. 5, the computer called “minoas” can act as an open recursive DNS ((ORDNS), which can be a rDNS server willing to resolve a domain name for any host in the Internet—inside or outside its network) and as an authoritative DNS server for root DNS servers. By doing this, the minoas computer can provide NXs that appear to originate from the root DNS servers. This can force the malware to lookup the next domain name and not stop probing since the minoas computer does not allow the malware to contact the root servers. As indicated earlier, the malware needs to make contact with the C&C at least one time during a set period (e.g., daily). Thus, by providing NX answers to any domain name that the malware requests, the “minoas” computer can cause the malware to keep looking up all the generated domain names (e.g., 10,000 ) because no successful C&C connection will take place. In this way, all 10,000 domain names can be observed and can be used to train a statistical class that can identify malware based only on this traffic.


Referring again to FIG. 5, the “kritis” computer can be configured to give free Internet access to the dns01, dns02, and dns03 computers for one hour, and for the next eight hours to redirect the DNS traffic to the “minoas” computer. A simple IP table firewall “rotating rule” at the gateway point (e.g., at the “kritis” computer) can be used to do this.


The VMNET 34 computer in FIG. 5 can be a virtual network connection between the virtual machines dns01, dns02, and dns03, and the virtual gateway “kritis”.


Referring back to FIG. 2, in 210, training vectors can be created by taking each set of for example, 10 domain names and computing various statistical values and putting the various statistical values in a vector. Example statistics are illustrated in FIG. 6, which is described in more detail below.


Those of ordinary skill in the art will see that training vectors can be created in many other ways, in addition to collecting NXs from honeypots, as described above.



FIG. 3 illustrates a method for creating testing vectors, according to one embodiment. In 305, NXs are collected from a real network. In 310, the NXs from the real network can be used to create testing vectors by taking each set of, for example, 10 NX domain names and computing various statistical values and putting the various statistical values in a vector. (It should be noted that both the honeypot NXs and the real network NXs can be grouped in any number, and any algorithm can be used to group the sets.) It is not known if the testing NXs are malware or not. Thus, in 315, the testing vectors can be classified as benign vectors or malicious vectors by comparing testing vectors to training vectors. A classifier can use the knowledge obtained from the statistical information from the training vectors and compare it to the statistical information from the testing vectors to identify each different malware family in the testing NX vectors. FIG. 8 illustrates several types of classifiers that can be used to compare the vector information and identify different malware families. In particular, FIG. 8 illustrates the following classifiers: Naïve Bayes, LAD Tree, Multi-Layer Perception, Logistic Regression, and IBK Lazy. Those of ordinary skill in the art will see that many other types of classifiers can also be used. In addition, as explained in more detail below with respect to FIG. 8, a meta-classifier can use many different types of classifiers. In some embodiments, as also described in more detail below with respect to FIG. 8, a confidence score can also be given for each classifier, as well as for the meta-classifier.


For example, an absolute timing sequence, which can list the domain names in the order that they are received, can be used to group together an example set of ten NX domain names (e.g., from a real network):

















fpemcjfbv.com



odkigktjzv.biz



odkigktjzv.biz.ebay.com



l-sjn-sevans.ca1.paypal.com



xvoal.com



ymtaiwwprpq.biz



ymtaiwwprpq.biz.ebay.com



bcbkdfkg.net



bcbkdfkg.net.ebay.com



okxixsulas.net










An example of various statistical values that can be computed for the set of NX domain names is illustrated in FIG. 6. Note that many other types of statistical values can be computed, and that the vector can have more or less statistical values than that called for in FIG. 6 (e.g., 17). Thus, for the example of 10 NX domain names provided above, the following statistical values can be computed. It should be noted that some or all of these statistical values can be computed. In addition, other statistical values can be computed and used.

    • The average of domain name length (not including “.”) (e.g., the domain name length of the first domain name is 13). [Value≈12.8333]
    • The standard deviation of the domain name length. [Value≈1.9507]
    • The number of different Top Level Domains (TLDs). [Value≈3.0]
    • The length of the longest domain name (excluding the TLD), [Value≈24.0]
    • The median of the frequency of each unique character across the entire set of domain names (e.g., the frequency of “o” across the entire set of 10 domain names above is 10). [Value≈2.0]
    • The average frequency of each unique character across the entire set of domain names. [Value≈2.2083]
    • The standard deviation of the frequency of each unique character across the entire set of domain names. [Value≈0.9565]
    • The median of the frequency of each unique 2-gram across the entire set of 10 domain names (e.g., the frequency of “fp” across the entire set of 10 domain names above is 1) (Note that if there is a “.” (e.g., “v.c”) between two characters, the frequency is counted as 0.) [Value≈0.9565]
    • The average of the frequency of each unique 2-gram across the entire set of 10 domain names. [Value≈1.0]
    • The standard deviation of the frequency of each unique 2-gram across the entire set of 10 domain names. [Value≈1.0]
    • The frequency of .com TLDs over the frequency of the other of TLDs. [Value≈1.5]
    • The median of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value≈0.3333]
    • The average of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value 1.0]
    • The standard deviation of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value≈1.0]
    • The median count of unique TLDs (excluding .com). [Value≈2.0]
    • The average count of unique TLDs (excluding .com). [Value≈2.0]
    • The standard deviation for the different frequencies for each different TLD in the set of domain names. [Value≈2.0]


The various statistical values for each set of 10 domain names from the real network NXs can be put in a vector. An example illustrating the domain names being transformed to statistical vectors, using the statistical values set forth in FIG. 6, is illustrated in FIG. 7. Referring to FIG. 7, in 705, the 10 domain names used to create the vector are listed. Note that all of these domain names can come from one particular asset 115 (e.g., an infected computer) in the real network 125:

















fpemcjfbv.com



odkigktjzv.biz



odkigktjzv.biz.inter1.com



l-sjn-sevans.ca1.intern2.com



xvoal.com



ymtaiwwprpq.biz



ymtaiwwprpq.biz.inter1.com



bcbkdfkg.net



bcbkdfkg.net.inter1.com



okxixsulas.net










The 17 statistical values corresponding to the statistical values found in FIG. 6 are illustrated in the vector 710: [12.8333, 1.9507, 3.0, 24.0, 2.0, 2.2083, 0.9565, 0.9565, 1.0, 1.0, 1.5, 0.3333, 1.0, 1.0, 0.0, 2.0, 2.0, 2.0].


The NX application 105 can then utilize a meta-classifier to classify the testing vectors. The meta-classifier is a hybrid classifier and can comprise several generic classifiers. The various generic classifiers can be used (e.g., in parallel) to capture various different statistical properties which can potentially lower false positives (FP) and increase true positives (TP).


For example, FIG. 8 illustrates a meta-classifier that is comprised of five different classifiers: the Naïve Bayes classifier 805, the LAD Tree classifier 810. the Multi-Layer Perception Neural Network classifier 815, the Logistic Regression classifier 820, and the IBK Lazy Classifier 825. The maximum probability includes the classification (given by a particular classifier for the malware) and the probability of this classification being correct. Thus, for example, five different types of classifiers can be used to classify the malware as follows:

  • Classifier 1 (Naive Bayes Meta.) is: notknown (Confidence: 1)
  • Classifier 2 (Multi Layer Per. Meta.) is: conficker-B (Confidence: 0.985572986223)
  • Classifier 3 (Logistic Regression Meta.) is: conficker-B (Confidence: 0.374297598072)
  • Classifier 4 (LADtree Meta.) is: conficker-B (Confidence: 0.220571723953)
  • Classifier 5 (Lazy IB1 Meta.) is conficker-B (Confidence: 1)


The majority voting can take the many classifications and determine which classification the majority of classifiers found. Thus, for the example above, conficker-B was the classification the majority of classifiers classified the malware as. The final class is the final classification based on the majority voting, which is conficker-B.


It should be noted that the meta-classifier can use any number and any type of known or unknown classifier, including, but not limited to, the above classifiers. The Naïve Bayes classifier can use estimator classes. Numeric estimator precision values can be chosen based on analysis of the training data. The LAD tree classifier can generate a multi-class alternating decision tree using a LogitBoost strategy. The Multi-Layer Perception Neural Network classifier can use back-propagation to classify instances. The Logistic Regression classifier can build linear logistic regression models. LogitBoost with simple regression can function as a base learner and can be used for fitting the logistic models. The IBK Lazy classifier can use normalized Euclidean distance to find the training instance closest o the given test instance, and can predict the same class as the training instance. If multiple instances have the same (smallest) distance to the test instance. the first one found can be used.


Additional information about all of the above classifiers can be found in Richard O. Duda et al., PATTERN CLASSIFICATION (2nd. Edition), which is herein incorporated by reference. Further information about the IBK Lazy classifier can be found in Niels Landwehr et al, LOGISTIC MODEL TREES (2005), which is also herein incorporated by reference.


For example, each classifier in the meta-classifier can classify vector 710 as follows:

  • Classifier 1 (Naive Bayes Meta.) is: notknown (Confidence: 1)
  • Classifier 2 (Multi Layer Per. Meta.) is: conficker-B (Confidence: 0.985572986223)
  • Classifier 3 (Logistic Regression Meta.) is: conficker-B (Confidence: 0.374297598072)
  • Classifier 4 (LADtree Meta.) is: conficker-B (Confidence: 0.220571723953))
  • Classifier 5 (Lazy IB1 Meta.) is: conficker-B (Confidence: 1)


Using the classification of the vector by each classifier, if a confidence threshold is set to be >=0.9 (note that this value can be set by the use), the meta-classifier can classify the vector (or statistical instance) as follow:


Instance 1 Meta classification detection result: conficker-B with majority voting value: 4 with confidence (med/std): (0.985572986223/0.345308923709). This means that a majority of four (out of five) of the classifiers found the vector to be classified as conficker-B. The median confidence score is the median of all five of the confidence scores, divided by the standard deviation of all five of the classifiers. It should be noted that, because the confidence threshold is set to be >=0.9, this number is only meaningful if the median confidence score is >=0.9.



FIG. 9 illustrates False Positive (FP) and True Positive (TP) classification results from the meta-classifier of FIG. 8 to one of six different malware classes: conficker-A, conficker-B, conficker-C, sinowal, bobax, and unknown. FIG. 9 indicates a FP value and a TP value for each type of malware. The FP rate is the False Positive detection rates for each different class. The TP rate is the True Positives detection rates for each different class. The FP rate can correspond to the percentage of vectors mistakenly classified as malicious which were actually benign. The TP rate corresponds to the percentage of vectors classified as malicious that were actually malicious. The following article, which is herein incorporated by reference, describes FP and TP rates in more detail: Axelsson, S., The Base-Rate Fallacy and the Difficulty of Intrusion Detection, ACM TRANS. INF. SYST. SECUR. 3, 3 (August 2000), 186-205.


It should be noted that the meta-classifier can be independent from the manner in which the NXs are collected. It is only necessary to keep a mapping between the internal asset that the NXs originated from. The detection flow is satisfied as long as the monitoring system in the real network collects NXs from the same internal asset and groups them into sets of 10 using the absolute timing sequence. This is because the classifier can be trained to detect such behavior. Thus, the trained classifier can utilize domain names collected in the same way in real time.



FIG. 9 also illustrates how domain names from known malware (e.g., sinowal, bobax, conficker) can be classified by the meta-classifier using information learned from the training set. Domain names that do not match the training set can be classified as “notknown” or “unknownBot”. Hand verification or other methods can be used to classify the “notknown” domain names.


It should be noted that if many NXs are classified as “unknown”, either a DNS issue causes such characterization, or the NXs are from malware where little or no information about the malware is known (e.g., a new worm). DNS issues can include a DNS outage or DNS misconfiguration. If a DNS issue is the cause of the high number of “unknown” classifications, the NXs can be classified as legitimate using for example, alexa.com, or a passive DNS feed. A passive DNS feed can be a technology which constructs zone replicas without cooperation from zone administrators, based on captured name server responses (see, e.g., F. Weimer, Passive DNS Replications, http://www.enyo.de/fw/software/dnslogger/2007, which is herein incorporated by reference). An example of a passive DNS feed is a DNSParse, which can be, for example, an implementation of the passive DNS technology by the University of Auckland in New Zealand (see, e.g., https://dnsparse.insec.auckland.ac.nz/dns/2007, which is herein incorporated by reference).



FIG. 10 illustrates an example of how to identify similar patterns in NXs and use those patterns to train a new class (e.g., an unknown-bot class). For example, reviewing the NXs of FIG. 10, a person or computer program could determine malware patterns such as, but not limited to: a size of 8 (after www) with a top level domain of .com.


While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the present invention. Thus, the present invention should not be limited by any of the above-described exemplary embodiments.


In addition, it should be understood that the figures described above, which highlight the functionality and advantages of the present invention, are presented for example purposes only. The architecture of the present invention is sufficiently flexible and configurable, such that it may be utilized in ways other than that shown in the figures.


Further, the purpose of the Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. The Abstract of the Disclosure is not intended to be limiting as to the scope of the present invention in any way.


Finally, it is the applicant's intent that only claims that include the express language “means for” or “step for” be interpreted under 35 U.S.C. 112, paragraph 6. Claims that do not expressly include the phrase “means for” or “step for” are not to be interpreted under 35 U.S.C. 112, paragraph 6.

Claims
  • 1. A method of analysis, comprising: collecting, using at least one processor circuit in communication with at least one database, NX domain names from at least one asset in at least one real network, the NX domain names being domain names that are not registered;utilizing, using the at least one processor circuit in communication with at least one database, statistical information about the NX domain names to create testing vectors; andclassifying, using the at least one processor circuit in communication with at least one database, the testing vectors as benign vectors or malicious vectors based on training vectors by comparing the statistical information in the testing vectors to statistical information in training vectors, the statistical information comprising: an average of domain name length; a standard deviation of a domain name length; a number of different top level domains; a length of a domain name excluding a top level domain; a median of a number of unique characters; an average of a number of unique characters; a standard deviation of a number of unique characters; a median of unique 2-grams; an average of unique 2-grams; a standard deviation of unique 2-grams; a frequency of ,com top level domains over frequency of remaining to level domains; a median of unique 3-grams; an average of unique 3-grams; a standard deviation of unique 3-grams; a median count of unique top level domains; an average count of unique top level domains; or a standard deviation count of top level domains; or any combination thereof.
  • 2. The method of claim 1, further comprising using at least one meta-classifier comprising at least two classifiers.
  • 3. The method of claim 2, wherein the meta-classifier provides intelligence for identifying new malware.
  • 4. The method of claim 1, wherein only NX domain traffic is utilized.
  • 5. The method of claim 1, wherein similar patterns in NX domain names are identified and used to model new botnets.
  • 6. A system of analysis, comprising: at least one processor circuit in communication with at least one database, the at least one processor circuit connected to at least one network and configured for:collecting NX domain names from at least one asset in at least one real network, the NX domain names being domain names that are not registered;utilizing statistical information about the NX domain names to create testing vectors; andclassifying the testing vectors as benign vectors or malicious vectors based on training vectors by comparing the statistical information in the testing vectors to statistical information in training vectors, the statistical information comprising: an average of domain name length; a standard deviation of a domain name length; a number of different top level domains; a length of a domain name excluding a top level domain; a median of a number of unique characters; an average of a number of unique characters; a standard deviation of a number of unique characters; a median of unique 2-grams; an average of unique 2-grams; a standard deviation of unique 2-grams; a frequency of ,com top level domains over frequency of remaining to level domains; a median of unique 3-grams; an average of unique 3-grams; a standard deviation of unique 3-grams; a median count of unique top level domains; an average count of unique top level domains; or a standard deviation count of top level domains; or any combination thereof.
  • 7. The system of claim 6, further comprising using at least one meta-classifier comprising at least two classifiers.
  • 8. The system of claim 7, wherein the meta-classifier provides intelligence for identifying new malware.
  • 9. The system of claim 6, wherein only NX domain traffic is utilized.
  • 10. The system of claim 6, wherein similar patterns in NX domain names are identified and used to model new botnets.
CROSS-REFERENCED TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 12/985,140 filed Jan. 5, 2011. which claims benefit of U.S. Provisional Patent Application No. 61/292,592 filed Jan. 6, 2010, and U.S. Provisional Patent Application No. 61/295,060 filed Jan. 14, 2010, the contents of which are incorporated herein by reference in their entireties.

US Referenced Citations (206)
Number Name Date Kind
4843540 Stolfo Jun 1989 A
4860201 Stolfo et al. Aug 1989 A
5363473 Stolfo et al. Nov 1994 A
5497486 Stolfo et al. Mar 1996 A
5563783 Stolfo et al. Oct 1996 A
5668897 Stolfo Sep 1997 A
5717915 Stolfo et al. Feb 1998 A
5748780 Stolfo May 1998 A
5920848 Schutzer et al. Jul 1999 A
6401118 Thomas Jun 2002 B1
6983320 Thomas et al. Jan 2006 B1
7013323 Thomas et al. Mar 2006 B1
7039721 Wu et al. May 2006 B1
7069249 Stolfo et al. Jun 2006 B2
7093292 Pantuso Aug 2006 B1
7136932 Schneider Nov 2006 B1
7152242 Douglas Dec 2006 B2
7162741 Eskin et al. Jan 2007 B2
7225343 Honig et al. May 2007 B1
7277961 Smith et al. Oct 2007 B1
7278163 Banzhof Oct 2007 B2
7331060 Ricciulli Feb 2008 B1
7372809 Chen et al. May 2008 B2
7383577 Hrastar et al. Jun 2008 B2
7424619 Fan et al. Sep 2008 B1
7426576 Banga et al. Sep 2008 B1
7448084 Apap et al. Nov 2008 B1
7483947 Starbuck Jan 2009 B2
7487544 Schultz et al. Feb 2009 B2
7536360 Stolfo et al. May 2009 B2
7634808 Szor Dec 2009 B1
7639714 Stolfo et al. Dec 2009 B2
7657935 Stolfo et al. Feb 2010 B2
7665131 Goodman Feb 2010 B2
7698442 Krishnamurthy Apr 2010 B1
7712134 Nucci et al. May 2010 B1
7752125 Kothari et al. Jul 2010 B1
7752665 Robertson et al. Jul 2010 B1
7779463 Stolfo et al. Aug 2010 B2
7784097 Stolfo et al. Aug 2010 B1
7818797 Fan et al. Oct 2010 B1
7882542 Neystadt Feb 2011 B2
7890627 Thomas Feb 2011 B1
7913306 Apap et al. Mar 2011 B2
7930353 Chickering Apr 2011 B2
7962798 Locasto et al. Jun 2011 B2
7979907 Schultz et al. Jul 2011 B2
7996288 Stolfo Aug 2011 B1
8015414 Mahone Sep 2011 B2
8019764 Nucci Sep 2011 B1
8074115 Stolfo et al. Dec 2011 B2
8161130 Stokes Apr 2012 B2
8170966 Musat et al. May 2012 B1
8200761 Tevanian Jun 2012 B1
8224994 Schneider Jul 2012 B1
8260914 Ranjan Sep 2012 B1
8341745 Chau Dec 2012 B1
8347394 Lee Jan 2013 B1
8402543 Ranjan et al. Mar 2013 B1
8418249 Nucci et al. Apr 2013 B1
8484377 Chen et al. Jul 2013 B1
8516585 Cao et al. Aug 2013 B2
8527592 Gabe Sep 2013 B2
8631489 Antonakakis et al. Jan 2014 B2
8826438 Perdisci et al. Sep 2014 B2
20010014093 Yoda et al. Aug 2001 A1
20010044785 Stolfo et al. Nov 2001 A1
20010052007 Shigezumi Dec 2001 A1
20010052016 Skene et al. Dec 2001 A1
20010055299 Kelly Dec 2001 A1
20020021703 Tsuchiya et al. Feb 2002 A1
20020066034 Schlossberg et al. May 2002 A1
20020166063 Lachman et al. Nov 2002 A1
20030065926 Schultz et al. Apr 2003 A1
20030065943 Geis et al. Apr 2003 A1
20030069992 Ramig Apr 2003 A1
20030167402 Stolfo et al. Sep 2003 A1
20030204621 Poletto et al. Oct 2003 A1
20030236995 Fretwell, Jr. Dec 2003 A1
20040002903 Stolfo et al. Jan 2004 A1
20040088646 Yeager May 2004 A1
20040111636 Baffes et al. Jun 2004 A1
20040187032 Gels et al. Sep 2004 A1
20040205474 Eskin et al. Oct 2004 A1
20040215972 Sung et al. Oct 2004 A1
20050021848 Jorgenson Jan 2005 A1
20050039019 Delany Feb 2005 A1
20050086523 Zimmer et al. Apr 2005 A1
20050108407 Johnson et al. May 2005 A1
20050108415 Turk et al. May 2005 A1
20050257264 Stolfo et al. Nov 2005 A1
20050261943 Quarterman et al. Nov 2005 A1
20050265331 Stolfo Dec 2005 A1
20050278540 Cho Dec 2005 A1
20050281291 Stolfo et al. Dec 2005 A1
20060015630 Stolfo et al. Jan 2006 A1
20060031483 Lund Feb 2006 A1
20060068806 Nam Mar 2006 A1
20060075084 Lyon Apr 2006 A1
20060143711 Huang et al. Jun 2006 A1
20060146816 Jain Jul 2006 A1
20060150249 Gassen et al. Jul 2006 A1
20060156402 Stone et al. Jul 2006 A1
20060168024 Mehr Jul 2006 A1
20060178994 Stolfo et al. Aug 2006 A1
20060200539 Kappler et al. Sep 2006 A1
20060212925 Shull Sep 2006 A1
20060224677 Ishikawa et al. Oct 2006 A1
20060230039 Shull Oct 2006 A1
20060247982 Stolfo et al. Nov 2006 A1
20060253581 Dixon Nov 2006 A1
20060253584 Dixon Nov 2006 A1
20060259967 Thomas et al. Nov 2006 A1
20060265436 Edmond Nov 2006 A1
20070050708 Gupta et al. Mar 2007 A1
20070056038 Lok Mar 2007 A1
20070064617 Reves Mar 2007 A1
20070076606 Olesinski Apr 2007 A1
20070083931 Spiegel Apr 2007 A1
20070118669 Rand et al. May 2007 A1
20070136455 Lee et al. Jun 2007 A1
20070162587 Lund et al. Jul 2007 A1
20070209074 Coffman Sep 2007 A1
20070239999 Honig et al. Oct 2007 A1
20070274312 Salmela et al. Nov 2007 A1
20070294419 Ulevitch Dec 2007 A1
20080028073 Trabe et al. Jan 2008 A1
20080028463 Dagon Jan 2008 A1
20080060054 Srivastava Mar 2008 A1
20080060071 Hennan Mar 2008 A1
20080098476 Syversen Apr 2008 A1
20080133300 Jalinous Jun 2008 A1
20080155694 Kwon et al. Jun 2008 A1
20080177736 Spangler Jul 2008 A1
20080178293 Keen et al. Jul 2008 A1
20080184371 Moskovitch Jul 2008 A1
20080195369 Duyanovich et al. Aug 2008 A1
20080222729 Chen et al. Sep 2008 A1
20080229415 Kapoor Sep 2008 A1
20080262985 Cretu et al. Oct 2008 A1
20080263659 Alme Oct 2008 A1
20080276111 Jacoby et al. Nov 2008 A1
20090055929 Lee et al. Feb 2009 A1
20090083855 Apap et al. Mar 2009 A1
20090106304 Song Apr 2009 A1
20090138590 Lee et al. May 2009 A1
20090193293 Stolfo et al. Jul 2009 A1
20090198997 Yeap Aug 2009 A1
20090210417 Bennett Aug 2009 A1
20090222922 Sidiroglou et al. Sep 2009 A1
20090241190 Todd et al. Sep 2009 A1
20090241191 Keromytis et al. Sep 2009 A1
20090254658 Kamikura et al. Oct 2009 A1
20090254989 Achan et al. Oct 2009 A1
20090254992 Schultz et al. Oct 2009 A1
20090265777 Scott Oct 2009 A1
20090282479 Smith et al. Nov 2009 A1
20090327487 Olson et al. Dec 2009 A1
20100011243 Locasto et al. Jan 2010 A1
20100011420 Drako Jan 2010 A1
20100017487 Patinkin Jan 2010 A1
20100023810 Stolfo et al. Jan 2010 A1
20100031358 Elovici et al. Feb 2010 A1
20100034109 Shomura et al. Feb 2010 A1
20100037314 Perdisci et al. Feb 2010 A1
20100054278 Stolfo et al. Mar 2010 A1
20100064368 Stolfo et al. Mar 2010 A1
20100064369 Stolfo et al. Mar 2010 A1
20100077483 Stolfo et al. Mar 2010 A1
20100138919 Peng Jun 2010 A1
20100146615 Locasto et al. Jun 2010 A1
20100153785 Keromytis et al. Jun 2010 A1
20100169970 Stolfo et al. Jul 2010 A1
20100235915 Memon et al. Sep 2010 A1
20100269175 Stolfo et al. Oct 2010 A1
20100274970 Treuhaft et al. Oct 2010 A1
20100275263 Bennett et al. Oct 2010 A1
20100281539 Burns et al. Nov 2010 A1
20100281541 Stolfo et al. Nov 2010 A1
20100281542 Stolfo et al. Nov 2010 A1
20100319069 Granstedt Dec 2010 A1
20100332680 Anderson et al. Dec 2010 A1
20110041179 Stahlberg Feb 2011 A1
20110067106 Evans et al. Mar 2011 A1
20110167493 Song et al. Jul 2011 A1
20110167494 Bowen et al. Jul 2011 A1
20110167495 Antonakakis et al. Jul 2011 A1
20110185423 Sallam Jul 2011 A1
20110185428 Sallam Jul 2011 A1
20110214161 Stolfo et al. Sep 2011 A1
20110283361 Perdisci et al. Nov 2011 A1
20120042381 Antonakakis et al. Feb 2012 A1
20120079101 Muppala et al. Mar 2012 A1
20120084860 Cao et al. Apr 2012 A1
20120117641 Holloway May 2012 A1
20120143650 Crowley et al. Jun 2012 A1
20120198549 Antonakakis Aug 2012 A1
20130191915 Antonakakis et al. Jul 2013 A1
20130232574 Carothers Sep 2013 A1
20140059216 Jerrim Feb 2014 A1
20140068763 Ward et al. Mar 2014 A1
20140068775 Ward et al. Mar 2014 A1
20140075558 Ward et al. Mar 2014 A1
20140090058 Ward et al. Mar 2014 A1
20140101759 Antonakakis et al. Apr 2014 A1
20140289854 Mahvi Sep 2014 A1
Foreign Referenced Citations (2)
Number Date Country
WO 0237730 May 2002 WO
WO 02098100 Dec 2002 WO
Non-Patent Literature Citations (64)
Entry
U.S. Appl. No. 14/015,611, filed Aug. 30, 2013, Pending.
U.S. Appl. No. 14/096,803, filed Dec. 4, 2013, Pending.
Manos Antonakakis et al., “Building a Dynamic Reputation System for DNS”, 19th USENIX Security Symposium, Aug. 11-13, 2010 (17 pages).
Manos Antonakakis et al., “From Throw-Away Traffic to Bots: Detecting the rise of DGA-Based Malware”, In Proceedings of the 21st USENIX Conference on Security Symposium (Security'12), (2012) (16 pages).
Yajin Zhou et al., “Dissecting Android Malware: Characterization and Evolution”, 2012 IEEE Symposium on Security and Privacy, pp. 95-109 (2012).
File History of U.S. Appl. No. 11/538,212.
File History of U.S. Appl. No. 12/538,612.
File History of U.S. Appl. No. 12/985,140.
File History of U.S. Appl. No. 13/008,257.
File History of U.S. Appl. No. 13/205,928.
File History of U.S. Appl. No. 13/309,202.
File History of U.S. Appl. No. 13/358,303.
File History of U.S. Appl. No. 13/749,205.
File History of U.S. Appl. No. 14/010,016.
File History of U.S. Appl. No. 14/015,582.
File History of U.S. Appl. No. 14/015,621.
File History of U.S. Appl. No. 14/015,663.
File History of U.S. Appl. No. 14/015,704.
File History of U.S. Appl. No. 14/015,661.
File History of U.S. Appl. No. 14/096,803.
File History of U.S. Appl. No. 14/194,076.
File History of U.S. Appl. No. 14/305,998.
File History of U.S. Appl. No. 14/317,785.
File History of U.S. Appl. No. 14/304,015.
File History of U.S. Appl. No. 14/616,387.
File History of U.S. Appl. No. 14/668,329.
File History of U.S. Appl. No. 12/538,612, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 13/749,205, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,582, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,663, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,704, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,661, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/096,803, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/305,998, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/317,785, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 15/019,272, electronically captured from PAIR on Feb. 12, 2016.
File History of U.S. Appl. No. 12/538,612, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 13/309,202, electronically captured from PAIR on Apr. 4, 2016 for Nov. 19, 2015 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/015,582, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/015,704, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/194,076, electronically captured from PAIR on Apr. 4, 2016 for Nov. 19, 2015 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/305,998, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
Leo Breiman, “Bagging Predictors”, Machine Learning, vol. 24, pp. 123-140 (1996).
David S. Anderson et al., “Spamscatter: Characterizing Internet Scam Hosting Infrastructure”, Proceedings of the USENIX Security Symposium (2007) (14 pages).
Sujata Garera et al., “A Framework for Detection and Measurement of Phishing Attacks”, WORM'07, pp. 1-8, Nov. 2, 2007.
Torsten Horthorn et al., “Double-Bagging: Combining Classifiers by Bootstrap Aggregation”, Pattern Recognition, vol. 36, pp. 1303-1309 (2003).
Roberto Perdisci et al., “Detecting Malicious Flux Service Networks Through Passive Analysis of Recursive DNS Traces”, Proceedings of ACSAC, Honolulu, Hawaii, USA (2009) (10 pages).
Shuang Hao et al., “Detecting Spammers with SNARE: Spatiotemporal Network-Level Automatic Reputation Engine”, 18th USENIX Security Symposium, pp. 101-117 (2009).
Kazumichi Sato et al., “Extending Black Domain Name List by Using Co-Occurrence Relation Between DNS Queries”, Presentation in the Third USENIX LEET Workshop (2010) (22 pages).
Sushant Sinha et al., “Shades of Grey: On the Effectiveness of Reputation-Based Blacklists”, In 3rd International Conference on MALWARE (2008) (8 pages).
Zhiyun Qian et al., “On Network-Level Clusters for Spam Detection”, In Proceedings of the USENIX NDSS Symposium (2010) (17 pages).
Bojan Zdrnja et al., “Passive Monitoring of DNS Anomalies”, In Proceedings of DIMVA Conference (2007) (11 pages).
Jian Zhang et al., “Highly Predictive Blacklisting”, In Proceedings of the USENIX Security Symposium (2008) (16 pages).
http://www.uribl.com/about.shtml, retrieved from Internet Archive on Mar. 16, 2016, Archived Jul. 22, 2010 (4 pages).
http://www.spamhaus.org/zen/, retrieved from Internet Archive on Mar. 16, 2016, Archived Jul. 6, 2010 (3 pages).
Mathew Sullivan, “Fighting Spam by Finding and Listing Exploitable Servers”, Apricot 2006 (26 pages).
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Jul. 25, 2016 for Apr. 4, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/096,803, electronically captured from PAIR on Jul. 25, 2016 for Feb. 12, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/317,785, electronically captured from PAIR on Jul. 25, 2016 for Feb. 12, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/616,387, electronically captured from PAIR on Jul. 25, 2016 for Jun. 22, 2015 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/668,329, electronically captured from PAIR on Jul. 25, 2016 for Jun. 22, 2015 to Jul. 25, 2016.
Mekky et al. (Detecting Malicious HTTP Redirections Using Trees of User Browser Activity, IEEE INFOCOM 2014, pp. 1159-1167).
Related Publications (1)
Number Date Country
20140101759 A1 Apr 2014 US
Provisional Applications (2)
Number Date Country
61292592 Jan 2010 US
61295060 Jan 2010 US
Continuations (1)
Number Date Country
Parent 12985140 Jan 2011 US
Child 14041796 US