Method and system for detecting malware

Description

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates a system for detecting malware, according to one embodiment.

FIGS. 2-4 illustrate a method for detecting malware, according to one embodiment.

FIG. 5 illustrates various elements involved in domain name resolution.

FIGS. 6-10 illustrate examples for detecting malware, according to several embodiments.

DESCRIPTION OF EMBODIMENTS OF THE INVENTION

FIG. 1 illustrates a system for detecting malware, according to one embodiment. FIG. 1 illustrates at least one network 101 (e.g., the Internet) connecting at least one NX application 105 (described below) on at least one server 120 to at least one honeypot 110 and at least one entity's network 125 (e.g., a private network of a company). The NX application 105 can determine if one or more assets 115 (e.g., computers) on the at least one entity's network 125 is infected with malware. It should be noted that the asset can be a simple asset (e.g., mainframe hardware, storage) or a complex asset (e.g., licensed software).

The determination of whether an asset is infected can comprise: collecting NX domain names from at least one honeypot and at least one asset; using the honeypot NX domain names to create training vectors; using the real network NX domain names to create testing vectors; classifying the testing vectors as benign vectors or malicious vectors; and classifying the at least one asset in the at least one real network as infected if the NX testing vector created from the real network NX domain names is classified as a malicious vector. (It should be noted that the testing vectors can be classified using: simple internal assets infected with known malware; simple internal assets infected with unknown malware; or complex internal network assets; or any combination thereof.)

NX domain name information is useful because some malware takes advantage of existing domain name system (DNS) services such as free domain testing (e.g., determining whether a new domain name is available). Such malware can use a domain name generator that employs a seed, such as the date, together with an algorithm to generate a set of domain names. The command and control (C&C) can try to register the generated domain names until a registrable subset of domain lames has been identified. An infected computer can then use those daily-generated set of domain names in order to establish a new communication channel with the C&C. The victim computers will employ the same seed (i.e. date) and algorithm to generate the same set of domain names. The victim computers will then use the generated domain names in attempts to contact the C&C computer. Eventually, each victim computer will find a domain name that was registered for the C&C computer to enable daily communication between the C&C computer and the victim computers. By changing the domain name for the C&C computer (e.g., daily), it becomes difficult to statically black list the domain names or the IP addresses of the C&C computer(s).

Thus, malware which uses the above domain name resolution to establish communication with a C&C can produce many NX-Domains (NXs), which can be domain names that have not been registered with an authoritative DNS and can be observable at a recursive DNS server (“RDNS”). RDNS servers map domain names to IP addresses, also called “resolving DNS queries”. If such a mapping between a domain name and an IP address doesn't exist, the RNDS can send back to the initiator of the DNS query a “Non-Existence” response. The Non-Existence response can indicate that the domain name does not have an IP address, and is thus an NX-Domain (NX). Monitoring the NXs observable at a RDNS can provide the ability to collect all possible NXs generated from all computers connected to the RDNS.

FIG. 2 illustrates a method for creating training vectors, according to one embodiment. Referring to FIG. 2, in 205, malware NXs can be collected from at least one honeypot (e.g., an Internet-attached server that acts as a decoy, luring in potential hackers in order to study their activities and monitor how they are able to break into a system) by an NX application 105 and grouped into sets of for example, 10.

The malware NXs can be collected so that a classifier can be trained in a controlled environment to recognize different categories of infected computers. For example, FIG. 5 illustrates a honeypot network configuration. In this example, the virtual machine names “kritis” operates as an internal, virtual gateway for the virtual machines dns01, dns02 and dns03, which are infected with malware (e.g., sinowal worm, bobax worm). By monitoring the DNS traffic that originates from infected virtual machines dns01, dns02, and dns03, a pure seed of malware domain names can be obtained.

In FIG. 5, the computer called “minoas” can act as an open recursive DNS ((ORDNS), which can be a rDNS server willing to resolve a domain name for any host in the Internet—inside or outside its network) and as an authoritative DNS server for root DNS servers. By doing this, the minoas computer can provide NXs that appear to originate from the root DNS servers. This can force the malware to lookup the next domain name and not stop probing since the minoas computer does not allow the malware to contact the root servers. As indicated earlier, the malware needs to make contact with the C&C at least one time during a set period (e.g., daily). Thus, by providing NX answers to any domain name that the malware requests, the “minoas” computer can cause the malware to keep looking up all the generated domain names (e.g., 10,000 ) because no successful C&C connection will take place. In this way, all 10,000 domain names can be observed and can be used to train a statistical class that can identify malware based only on this traffic.

Referring again to FIG. 5, the “kritis” computer can be configured to give free Internet access to the dns01, dns02, and dns03 computers for one hour, and for the next eight hours to redirect the DNS traffic to the “minoas” computer. A simple IP table firewall “rotating rule” at the gateway point (e.g., at the “kritis” computer) can be used to do this.

The VMNET 34 computer in FIG. 5 can be a virtual network connection between the virtual machines dns01, dns02, and dns03, and the virtual gateway “kritis”.

Referring back to FIG. 2, in 210, training vectors can be created by taking each set of for example, 10 domain names and computing various statistical values and putting the various statistical values in a vector. Example statistics are illustrated in FIG. 6, which is described in more detail below.

Those of ordinary skill in the art will see that training vectors can be created in many other ways, in addition to collecting NXs from honeypots, as described above.

FIG. 3 illustrates a method for creating testing vectors, according to one embodiment. In 305, NXs are collected from a real network. In 310, the NXs from the real network can be used to create testing vectors by taking each set of, for example, 10 NX domain names and computing various statistical values and putting the various statistical values in a vector. (It should be noted that both the honeypot NXs and the real network NXs can be grouped in any number, and any algorithm can be used to group the sets.) It is not known if the testing NXs are malware or not. Thus, in 315, the testing vectors can be classified as benign vectors or malicious vectors by comparing testing vectors to training vectors. A classifier can use the knowledge obtained from the statistical information from the training vectors and compare it to the statistical information from the testing vectors to identify each different malware family in the testing NX vectors. FIG. 8 illustrates several types of classifiers that can be used to compare the vector information and identify different malware families. In particular, FIG. 8 illustrates the following classifiers: Naïve Bayes, LAD Tree, Multi-Layer Perception, Logistic Regression, and IBK Lazy. Those of ordinary skill in the art will see that many other types of classifiers can also be used. In addition, as explained in more detail below with respect to FIG. 8, a meta-classifier can use many different types of classifiers. In some embodiments, as also described in more detail below with respect to FIG. 8, a confidence score can also be given for each classifier, as well as for the meta-classifier.

For example, an absolute timing sequence, which can list the domain names in the order that they are received, can be used to group together an example set of ten NX domain names (e.g., from a real network):

fpemcjfbv.com

odkigktjzv.biz

odkigktjzv.biz.ebay.com

l-sjn-sevans.ca1.paypal.com

xvoal.com

ymtaiwwprpq.biz

ymtaiwwprpq.biz.ebay.com

bcbkdfkg.net

bcbkdfkg.net.ebay.com

okxixsulas.net

An example of various statistical values that can be computed for the set of NX domain names is illustrated in FIG. 6. Note that many other types of statistical values can be computed, and that the vector can have more or less statistical values than that called for in FIG. 6 (e.g., 17). Thus, for the example of 10 NX domain names provided above, the following statistical values can be computed. It should be noted that some or all of these statistical values can be computed. In addition, other statistical values can be computed and used.

- The average of domain name length (not including “.”) (e.g., the domain name length of the first domain name is 13). [Value≈12.8333]
- The standard deviation of the domain name length. [Value≈1.9507]
- The number of different Top Level Domains (TLDs). [Value≈3.0]
- The length of the longest domain name (excluding the TLD), [Value≈24.0]
- The median of the frequency of each unique character across the entire set of domain names (e.g., the frequency of “o” across the entire set of 10 domain names above is 10). [Value≈2.0]
- The average frequency of each unique character across the entire set of domain names. [Value≈2.2083]
- The standard deviation of the frequency of each unique character across the entire set of domain names. [Value≈0.9565]
- The median of the frequency of each unique 2-gram across the entire set of 10 domain names (e.g., the frequency of “fp” across the entire set of 10 domain names above is 1) (Note that if there is a “.” (e.g., “v.c”) between two characters, the frequency is counted as 0.) [Value≈0.9565]
- The average of the frequency of each unique 2-gram across the entire set of 10 domain names. [Value≈1.0]
- The standard deviation of the frequency of each unique 2-gram across the entire set of 10 domain names. [Value≈1.0]
- The frequency of .com TLDs over the frequency of the other of TLDs. [Value≈1.5]
- The median of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value≈0.3333]
- The average of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value 1.0]
- The standard deviation of the frequency of each unique 3-gram across the entire set of 10 domain names. [Value≈1.0]
- The median count of unique TLDs (excluding .com). [Value≈2.0]
- The average count of unique TLDs (excluding .com). [Value≈2.0]
- The standard deviation for the different frequencies for each different TLD in the set of domain names. [Value≈2.0]

The various statistical values for each set of 10 domain names from the real network NXs can be put in a vector. An example illustrating the domain names being transformed to statistical vectors, using the statistical values set forth in FIG. 6, is illustrated in FIG. 7. Referring to FIG. 7, in 705, the 10 domain names used to create the vector are listed. Note that all of these domain names can come from one particular asset 115 (e.g., an infected computer) in the real network 125:

fpemcjfbv.com

odkigktjzv.biz

odkigktjzv.biz.inter1.com

l-sjn-sevans.ca1.intern2.com

xvoal.com

ymtaiwwprpq.biz

ymtaiwwprpq.biz.inter1.com

bcbkdfkg.net

bcbkdfkg.net.inter1.com

okxixsulas.net

The 17 statistical values corresponding to the statistical values found in FIG. 6 are illustrated in the vector 710: [12.8333, 1.9507, 3.0, 24.0, 2.0, 2.2083, 0.9565, 0.9565, 1.0, 1.0, 1.5, 0.3333, 1.0, 1.0, 0.0, 2.0, 2.0, 2.0].

The NX application 105 can then utilize a meta-classifier to classify the testing vectors. The meta-classifier is a hybrid classifier and can comprise several generic classifiers. The various generic classifiers can be used (e.g., in parallel) to capture various different statistical properties which can potentially lower false positives (FP) and increase true positives (TP).

For example, FIG. 8 illustrates a meta-classifier that is comprised of five different classifiers: the Naïve Bayes classifier 805, the LAD Tree classifier 810. the Multi-Layer Perception Neural Network classifier 815, the Logistic Regression classifier 820, and the IBK Lazy Classifier 825. The maximum probability includes the classification (given by a particular classifier for the malware) and the probability of this classification being correct. Thus, for example, five different types of classifiers can be used to classify the malware as follows:

Classifier 1 (Naive Bayes Meta.) is: notknown (Confidence: 1)
Classifier 2 (Multi Layer Per. Meta.) is: conficker-B (Confidence: 0.985572986223)
Classifier 3 (Logistic Regression Meta.) is: conficker-B (Confidence: 0.374297598072)
Classifier 4 (LADtree Meta.) is: conficker-B (Confidence: 0.220571723953)
Classifier 5 (Lazy IB1 Meta.) is conficker-B (Confidence: 1)

The majority voting can take the many classifications and determine which classification the majority of classifiers found. Thus, for the example above, conficker-B was the classification the majority of classifiers classified the malware as. The final class is the final classification based on the majority voting, which is conficker-B.

It should be noted that the meta-classifier can use any number and any type of known or unknown classifier, including, but not limited to, the above classifiers. The Naïve Bayes classifier can use estimator classes. Numeric estimator precision values can be chosen based on analysis of the training data. The LAD tree classifier can generate a multi-class alternating decision tree using a LogitBoost strategy. The Multi-Layer Perception Neural Network classifier can use back-propagation to classify instances. The Logistic Regression classifier can build linear logistic regression models. LogitBoost with simple regression can function as a base learner and can be used for fitting the logistic models. The IBK Lazy classifier can use normalized Euclidean distance to find the training instance closest o the given test instance, and can predict the same class as the training instance. If multiple instances have the same (smallest) distance to the test instance. the first one found can be used.

Additional information about all of the above classifiers can be found in Richard O. Duda et al., PATTERN CLASSIFICATION (2nd. Edition), which is herein incorporated by reference. Further information about the IBK Lazy classifier can be found in Niels Landwehr et al, LOGISTIC MODEL TREES (2005), which is also herein incorporated by reference.

For example, each classifier in the meta-classifier can classify vector 710 as follows:

Classifier 1 (Naive Bayes Meta.) is: notknown (Confidence: 1)
Classifier 2 (Multi Layer Per. Meta.) is: conficker-B (Confidence: 0.985572986223)
Classifier 3 (Logistic Regression Meta.) is: conficker-B (Confidence: 0.374297598072)
Classifier 4 (LADtree Meta.) is: conficker-B (Confidence: 0.220571723953))
Classifier 5 (Lazy IB1 Meta.) is: conficker-B (Confidence: 1)

Using the classification of the vector by each classifier, if a confidence threshold is set to be >=0.9 (note that this value can be set by the use), the meta-classifier can classify the vector (or statistical instance) as follow:

Instance 1 Meta classification detection result: conficker-B with majority voting value: 4 with confidence (med/std): (0.985572986223/0.345308923709). This means that a majority of four (out of five) of the classifiers found the vector to be classified as conficker-B. The median confidence score is the median of all five of the confidence scores, divided by the standard deviation of all five of the classifiers. It should be noted that, because the confidence threshold is set to be >=0.9, this number is only meaningful if the median confidence score is >=0.9.

FIG. 9 illustrates False Positive (FP) and True Positive (TP) classification results from the meta-classifier of FIG. 8 to one of six different malware classes: conficker-A, conficker-B, conficker-C, sinowal, bobax, and unknown. FIG. 9 indicates a FP value and a TP value for each type of malware. The FP rate is the False Positive detection rates for each different class. The TP rate is the True Positives detection rates for each different class. The FP rate can correspond to the percentage of vectors mistakenly classified as malicious which were actually benign. The TP rate corresponds to the percentage of vectors classified as malicious that were actually malicious. The following article, which is herein incorporated by reference, describes FP and TP rates in more detail: Axelsson, S., The Base-Rate Fallacy and the Difficulty of Intrusion Detection, ACM TRANS. INF. SYST. SECUR. 3, 3 (August 2000), 186-205.

It should be noted that the meta-classifier can be independent from the manner in which the NXs are collected. It is only necessary to keep a mapping between the internal asset that the NXs originated from. The detection flow is satisfied as long as the monitoring system in the real network collects NXs from the same internal asset and groups them into sets of 10 using the absolute timing sequence. This is because the classifier can be trained to detect such behavior. Thus, the trained classifier can utilize domain names collected in the same way in real time.

FIG. 9 also illustrates how domain names from known malware (e.g., sinowal, bobax, conficker) can be classified by the meta-classifier using information learned from the training set. Domain names that do not match the training set can be classified as “notknown” or “unknownBot”. Hand verification or other methods can be used to classify the “notknown” domain names.

It should be noted that if many NXs are classified as “unknown”, either a DNS issue causes such characterization, or the NXs are from malware where little or no information about the malware is known (e.g., a new worm). DNS issues can include a DNS outage or DNS misconfiguration. If a DNS issue is the cause of the high number of “unknown” classifications, the NXs can be classified as legitimate using for example, alexa.com, or a passive DNS feed. A passive DNS feed can be a technology which constructs zone replicas without cooperation from zone administrators, based on captured name server responses (see, e.g., F. Weimer, Passive DNS Replications, http://www.enyo.de/fw/software/dnslogger/2007, which is herein incorporated by reference). An example of a passive DNS feed is a DNSParse, which can be, for example, an implementation of the passive DNS technology by the University of Auckland in New Zealand (see, e.g., https://dnsparse.insec.auckland.ac.nz/dns/2007, which is herein incorporated by reference).

FIG. 10 illustrates an example of how to identify similar patterns in NXs and use those patterns to train a new class (e.g., an unknown-bot class). For example, reviewing the NXs of FIG. 10, a person or computer program could determine malware patterns such as, but not limited to: a size of 8 (after www) with a top level domain of .com.

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the present invention. Thus, the present invention should not be limited by any of the above-described exemplary embodiments.

In addition, it should be understood that the figures described above, which highlight the functionality and advantages of the present invention, are presented for example purposes only. The architecture of the present invention is sufficiently flexible and configurable, such that it may be utilized in ways other than that shown in the figures.

Further, the purpose of the Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. The Abstract of the Disclosure is not intended to be limiting as to the scope of the present invention in any way.

Finally, it is the applicant's intent that only claims that include the express language “means for” or “step for” be interpreted under 35 U.S.C. 112, paragraph 6. Claims that do not expressly include the phrase “means for” or “step for” are not to be interpreted under 35 U.S.C. 112, paragraph 6.

Claims

1. A method of analysis, comprising: collecting, using at least one processor circuit in communication with at least one database, NX domain names from at least one asset in at least one real network, the NX domain names being domain names that are not registered;utilizing, using the at least one processor circuit in communication with at least one database, statistical information about the NX domain names to create testing vectors; andclassifying, using the at least one processor circuit in communication with at least one database, the testing vectors as benign vectors or malicious vectors based on training vectors by comparing the statistical information in the testing vectors to statistical information in training vectors, the statistical information comprising: an average of domain name length; a standard deviation of a domain name length; a number of different top level domains; a length of a domain name excluding a top level domain; a median of a number of unique characters; an average of a number of unique characters; a standard deviation of a number of unique characters; a median of unique 2-grams; an average of unique 2-grams; a standard deviation of unique 2-grams; a frequency of ,com top level domains over frequency of remaining to level domains; a median of unique 3-grams; an average of unique 3-grams; a standard deviation of unique 3-grams; a median count of unique top level domains; an average count of unique top level domains; or a standard deviation count of top level domains; or any combination thereof.
2. The method of claim 1, further comprising using at least one meta-classifier comprising at least two classifiers.
3. The method of claim 2, wherein the meta-classifier provides intelligence for identifying new malware.
4. The method of claim 1, wherein only NX domain traffic is utilized.
5. The method of claim 1, wherein similar patterns in NX domain names are identified and used to model new botnets.
6. A system of analysis, comprising: at least one processor circuit in communication with at least one database, the at least one processor circuit connected to at least one network and configured for:collecting NX domain names from at least one asset in at least one real network, the NX domain names being domain names that are not registered;utilizing statistical information about the NX domain names to create testing vectors; andclassifying the testing vectors as benign vectors or malicious vectors based on training vectors by comparing the statistical information in the testing vectors to statistical information in training vectors, the statistical information comprising: an average of domain name length; a standard deviation of a domain name length; a number of different top level domains; a length of a domain name excluding a top level domain; a median of a number of unique characters; an average of a number of unique characters; a standard deviation of a number of unique characters; a median of unique 2-grams; an average of unique 2-grams; a standard deviation of unique 2-grams; a frequency of ,com top level domains over frequency of remaining to level domains; a median of unique 3-grams; an average of unique 3-grams; a standard deviation of unique 3-grams; a median count of unique top level domains; an average count of unique top level domains; or a standard deviation count of top level domains; or any combination thereof.
7. The system of claim 6, further comprising using at least one meta-classifier comprising at least two classifiers.
8. The system of claim 7, wherein the meta-classifier provides intelligence for identifying new malware.
9. The system of claim 6, wherein only NX domain traffic is utilized.
10. The system of claim 6, wherein similar patterns in NX domain names are identified and used to model new botnets.

CROSS-REFERENCED TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 12/985,140 filed Jan. 5, 2011. which claims benefit of U.S. Provisional Patent Application No. 61/292,592 filed Jan. 6, 2010, and U.S. Provisional Patent Application No. 61/295,060 filed Jan. 14, 2010, the contents of which are incorporated herein by reference in their entireties.

US Referenced Citations (206)

Number	Name	Date	Kind
4843540	Stolfo	Jun 1989	A
4860201	Stolfo et al.	Aug 1989	A
5363473	Stolfo et al.	Nov 1994	A
5497486	Stolfo et al.	Mar 1996	A
5563783	Stolfo et al.	Oct 1996	A
5668897	Stolfo	Sep 1997	A
5717915	Stolfo et al.	Feb 1998	A
5748780	Stolfo	May 1998	A
5920848	Schutzer et al.	Jul 1999	A
6401118	Thomas	Jun 2002	B1
6983320	Thomas et al.	Jan 2006	B1
7013323	Thomas et al.	Mar 2006	B1
7039721	Wu et al.	May 2006	B1
7069249	Stolfo et al.	Jun 2006	B2
7093292	Pantuso	Aug 2006	B1
7136932	Schneider	Nov 2006	B1
7152242	Douglas	Dec 2006	B2
7162741	Eskin et al.	Jan 2007	B2
7225343	Honig et al.	May 2007	B1
7277961	Smith et al.	Oct 2007	B1
7278163	Banzhof	Oct 2007	B2
7331060	Ricciulli	Feb 2008	B1
7372809	Chen et al.	May 2008	B2
7383577	Hrastar et al.	Jun 2008	B2
7424619	Fan et al.	Sep 2008	B1
7426576	Banga et al.	Sep 2008	B1
7448084	Apap et al.	Nov 2008	B1
7483947	Starbuck	Jan 2009	B2
7487544	Schultz et al.	Feb 2009	B2
7536360	Stolfo et al.	May 2009	B2
7634808	Szor	Dec 2009	B1
7639714	Stolfo et al.	Dec 2009	B2
7657935	Stolfo et al.	Feb 2010	B2
7665131	Goodman	Feb 2010	B2
7698442	Krishnamurthy	Apr 2010	B1
7712134	Nucci et al.	May 2010	B1
7752125	Kothari et al.	Jul 2010	B1
7752665	Robertson et al.	Jul 2010	B1
7779463	Stolfo et al.	Aug 2010	B2
7784097	Stolfo et al.	Aug 2010	B1
7818797	Fan et al.	Oct 2010	B1
7882542	Neystadt	Feb 2011	B2
7890627	Thomas	Feb 2011	B1
7913306	Apap et al.	Mar 2011	B2
7930353	Chickering	Apr 2011	B2
7962798	Locasto et al.	Jun 2011	B2
7979907	Schultz et al.	Jul 2011	B2
7996288	Stolfo	Aug 2011	B1
8015414	Mahone	Sep 2011	B2
8019764	Nucci	Sep 2011	B1
8074115	Stolfo et al.	Dec 2011	B2
8161130	Stokes	Apr 2012	B2
8170966	Musat et al.	May 2012	B1
8200761	Tevanian	Jun 2012	B1
8224994	Schneider	Jul 2012	B1
8260914	Ranjan	Sep 2012	B1
8341745	Chau	Dec 2012	B1
8347394	Lee	Jan 2013	B1
8402543	Ranjan et al.	Mar 2013	B1
8418249	Nucci et al.	Apr 2013	B1
8484377	Chen et al.	Jul 2013	B1
8516585	Cao et al.	Aug 2013	B2
8527592	Gabe	Sep 2013	B2
8631489	Antonakakis et al.	Jan 2014	B2
8826438	Perdisci et al.	Sep 2014	B2
20010014093	Yoda et al.	Aug 2001	A1
20010044785	Stolfo et al.	Nov 2001	A1
20010052007	Shigezumi	Dec 2001	A1
20010052016	Skene et al.	Dec 2001	A1
20010055299	Kelly	Dec 2001	A1
20020021703	Tsuchiya et al.	Feb 2002	A1
20020066034	Schlossberg et al.	May 2002	A1
20020166063	Lachman et al.	Nov 2002	A1
20030065926	Schultz et al.	Apr 2003	A1
20030065943	Geis et al.	Apr 2003	A1
20030069992	Ramig	Apr 2003	A1
20030167402	Stolfo et al.	Sep 2003	A1
20030204621	Poletto et al.	Oct 2003	A1
20030236995	Fretwell, Jr.	Dec 2003	A1
20040002903	Stolfo et al.	Jan 2004	A1
20040088646	Yeager	May 2004	A1
20040111636	Baffes et al.	Jun 2004	A1
20040187032	Gels et al.	Sep 2004	A1
20040205474	Eskin et al.	Oct 2004	A1
20040215972	Sung et al.	Oct 2004	A1
20050021848	Jorgenson	Jan 2005	A1
20050039019	Delany	Feb 2005	A1
20050086523	Zimmer et al.	Apr 2005	A1
20050108407	Johnson et al.	May 2005	A1
20050108415	Turk et al.	May 2005	A1
20050257264	Stolfo et al.	Nov 2005	A1
20050261943	Quarterman et al.	Nov 2005	A1
20050265331	Stolfo	Dec 2005	A1
20050278540	Cho	Dec 2005	A1
20050281291	Stolfo et al.	Dec 2005	A1
20060015630	Stolfo et al.	Jan 2006	A1
20060031483	Lund	Feb 2006	A1
20060068806	Nam	Mar 2006	A1
20060075084	Lyon	Apr 2006	A1
20060143711	Huang et al.	Jun 2006	A1
20060146816	Jain	Jul 2006	A1
20060150249	Gassen et al.	Jul 2006	A1
20060156402	Stone et al.	Jul 2006	A1
20060168024	Mehr	Jul 2006	A1
20060178994	Stolfo et al.	Aug 2006	A1
20060200539	Kappler et al.	Sep 2006	A1
20060212925	Shull	Sep 2006	A1
20060224677	Ishikawa et al.	Oct 2006	A1
20060230039	Shull	Oct 2006	A1
20060247982	Stolfo et al.	Nov 2006	A1
20060253581	Dixon	Nov 2006	A1
20060253584	Dixon	Nov 2006	A1
20060259967	Thomas et al.	Nov 2006	A1
20060265436	Edmond	Nov 2006	A1
20070050708	Gupta et al.	Mar 2007	A1
20070056038	Lok	Mar 2007	A1
20070064617	Reves	Mar 2007	A1
20070076606	Olesinski	Apr 2007	A1
20070083931	Spiegel	Apr 2007	A1
20070118669	Rand et al.	May 2007	A1
20070136455	Lee et al.	Jun 2007	A1
20070162587	Lund et al.	Jul 2007	A1
20070209074	Coffman	Sep 2007	A1
20070239999	Honig et al.	Oct 2007	A1
20070274312	Salmela et al.	Nov 2007	A1
20070294419	Ulevitch	Dec 2007	A1
20080028073	Trabe et al.	Jan 2008	A1
20080028463	Dagon	Jan 2008	A1
20080060054	Srivastava	Mar 2008	A1
20080060071	Hennan	Mar 2008	A1
20080098476	Syversen	Apr 2008	A1
20080133300	Jalinous	Jun 2008	A1
20080155694	Kwon et al.	Jun 2008	A1
20080177736	Spangler	Jul 2008	A1
20080178293	Keen et al.	Jul 2008	A1
20080184371	Moskovitch	Jul 2008	A1
20080195369	Duyanovich et al.	Aug 2008	A1
20080222729	Chen et al.	Sep 2008	A1
20080229415	Kapoor	Sep 2008	A1
20080262985	Cretu et al.	Oct 2008	A1
20080263659	Alme	Oct 2008	A1
20080276111	Jacoby et al.	Nov 2008	A1
20090055929	Lee et al.	Feb 2009	A1
20090083855	Apap et al.	Mar 2009	A1
20090106304	Song	Apr 2009	A1
20090138590	Lee et al.	May 2009	A1
20090193293	Stolfo et al.	Jul 2009	A1
20090198997	Yeap	Aug 2009	A1
20090210417	Bennett	Aug 2009	A1
20090222922	Sidiroglou et al.	Sep 2009	A1
20090241190	Todd et al.	Sep 2009	A1
20090241191	Keromytis et al.	Sep 2009	A1
20090254658	Kamikura et al.	Oct 2009	A1
20090254989	Achan et al.	Oct 2009	A1
20090254992	Schultz et al.	Oct 2009	A1
20090265777	Scott	Oct 2009	A1
20090282479	Smith et al.	Nov 2009	A1
20090327487	Olson et al.	Dec 2009	A1
20100011243	Locasto et al.	Jan 2010	A1
20100011420	Drako	Jan 2010	A1
20100017487	Patinkin	Jan 2010	A1
20100023810	Stolfo et al.	Jan 2010	A1
20100031358	Elovici et al.	Feb 2010	A1
20100034109	Shomura et al.	Feb 2010	A1
20100037314	Perdisci et al.	Feb 2010	A1
20100054278	Stolfo et al.	Mar 2010	A1
20100064368	Stolfo et al.	Mar 2010	A1
20100064369	Stolfo et al.	Mar 2010	A1
20100077483	Stolfo et al.	Mar 2010	A1
20100138919	Peng	Jun 2010	A1
20100146615	Locasto et al.	Jun 2010	A1
20100153785	Keromytis et al.	Jun 2010	A1
20100169970	Stolfo et al.	Jul 2010	A1
20100235915	Memon et al.	Sep 2010	A1
20100269175	Stolfo et al.	Oct 2010	A1
20100274970	Treuhaft et al.	Oct 2010	A1
20100275263	Bennett et al.	Oct 2010	A1
20100281539	Burns et al.	Nov 2010	A1
20100281541	Stolfo et al.	Nov 2010	A1
20100281542	Stolfo et al.	Nov 2010	A1
20100319069	Granstedt	Dec 2010	A1
20100332680	Anderson et al.	Dec 2010	A1
20110041179	Stahlberg	Feb 2011	A1
20110067106	Evans et al.	Mar 2011	A1
20110167493	Song et al.	Jul 2011	A1
20110167494	Bowen et al.	Jul 2011	A1
20110167495	Antonakakis et al.	Jul 2011	A1
20110185423	Sallam	Jul 2011	A1
20110185428	Sallam	Jul 2011	A1
20110214161	Stolfo et al.	Sep 2011	A1
20110283361	Perdisci et al.	Nov 2011	A1
20120042381	Antonakakis et al.	Feb 2012	A1
20120079101	Muppala et al.	Mar 2012	A1
20120084860	Cao et al.	Apr 2012	A1
20120117641	Holloway	May 2012	A1
20120143650	Crowley et al.	Jun 2012	A1
20120198549	Antonakakis	Aug 2012	A1
20130191915	Antonakakis et al.	Jul 2013	A1
20130232574	Carothers	Sep 2013	A1
20140059216	Jerrim	Feb 2014	A1
20140068763	Ward et al.	Mar 2014	A1
20140068775	Ward et al.	Mar 2014	A1
20140075558	Ward et al.	Mar 2014	A1
20140090058	Ward et al.	Mar 2014	A1
20140101759	Antonakakis et al.	Apr 2014	A1
20140289854	Mahvi	Sep 2014	A1

Foreign Referenced Citations (2)

Number	Date	Country
WO 0237730	May 2002	WO
WO 02098100	Dec 2002	WO

Non-Patent Literature Citations (64)

Entry
U.S. Appl. No. 14/015,611, filed Aug. 30, 2013, Pending.
U.S. Appl. No. 14/096,803, filed Dec. 4, 2013, Pending.
Manos Antonakakis et al., “Building a Dynamic Reputation System for DNS”, 19th USENIX Security Symposium, Aug. 11-13, 2010 (17 pages).
Manos Antonakakis et al., “From Throw-Away Traffic to Bots: Detecting the rise of DGA-Based Malware”, In Proceedings of the 21st USENIX Conference on Security Symposium (Security'12), (2012) (16 pages).
Yajin Zhou et al., “Dissecting Android Malware: Characterization and Evolution”, 2012 IEEE Symposium on Security and Privacy, pp. 95-109 (2012).
File History of U.S. Appl. No. 11/538,212.
File History of U.S. Appl. No. 12/538,612.
File History of U.S. Appl. No. 12/985,140.
File History of U.S. Appl. No. 13/008,257.
File History of U.S. Appl. No. 13/205,928.
File History of U.S. Appl. No. 13/309,202.
File History of U.S. Appl. No. 13/358,303.
File History of U.S. Appl. No. 13/749,205.
File History of U.S. Appl. No. 14/010,016.
File History of U.S. Appl. No. 14/015,582.
File History of U.S. Appl. No. 14/015,621.
File History of U.S. Appl. No. 14/015,663.
File History of U.S. Appl. No. 14/015,704.
File History of U.S. Appl. No. 14/015,661.
File History of U.S. Appl. No. 14/096,803.
File History of U.S. Appl. No. 14/194,076.
File History of U.S. Appl. No. 14/305,998.
File History of U.S. Appl. No. 14/317,785.
File History of U.S. Appl. No. 14/304,015.
File History of U.S. Appl. No. 14/616,387.
File History of U.S. Appl. No. 14/668,329.
File History of U.S. Appl. No. 12/538,612, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 13/749,205, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,582, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,663, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,704, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/015,661, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/096,803, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/305,998, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 14/317,785, electronically captured from PAIR on Feb. 12, 2016 for Nov. 19, 2015 to Feb. 12, 2016.
File History of U.S. Appl. No. 15/019,272, electronically captured from PAIR on Feb. 12, 2016.
File History of U.S. Appl. No. 12/538,612, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 13/309,202, electronically captured from PAIR on Apr. 4, 2016 for Nov. 19, 2015 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/015,582, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/015,704, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/194,076, electronically captured from PAIR on Apr. 4, 2016 for Nov. 19, 2015 to Apr. 4, 2016.
File History of U.S. Appl. No. 14/305,998, electronically captured from PAIR on Apr. 4, 2016 for Feb. 12, 2016 to Apr. 4, 2016.
Leo Breiman, “Bagging Predictors”, Machine Learning, vol. 24, pp. 123-140 (1996).
David S. Anderson et al., “Spamscatter: Characterizing Internet Scam Hosting Infrastructure”, Proceedings of the USENIX Security Symposium (2007) (14 pages).
Sujata Garera et al., “A Framework for Detection and Measurement of Phishing Attacks”, WORM'07, pp. 1-8, Nov. 2, 2007.
Torsten Horthorn et al., “Double-Bagging: Combining Classifiers by Bootstrap Aggregation”, Pattern Recognition, vol. 36, pp. 1303-1309 (2003).
Roberto Perdisci et al., “Detecting Malicious Flux Service Networks Through Passive Analysis of Recursive DNS Traces”, Proceedings of ACSAC, Honolulu, Hawaii, USA (2009) (10 pages).
Shuang Hao et al., “Detecting Spammers with SNARE: Spatiotemporal Network-Level Automatic Reputation Engine”, 18th USENIX Security Symposium, pp. 101-117 (2009).
Kazumichi Sato et al., “Extending Black Domain Name List by Using Co-Occurrence Relation Between DNS Queries”, Presentation in the Third USENIX LEET Workshop (2010) (22 pages).
Sushant Sinha et al., “Shades of Grey: On the Effectiveness of Reputation-Based Blacklists”, In 3rd International Conference on MALWARE (2008) (8 pages).
Zhiyun Qian et al., “On Network-Level Clusters for Spam Detection”, In Proceedings of the USENIX NDSS Symposium (2010) (17 pages).
Bojan Zdrnja et al., “Passive Monitoring of DNS Anomalies”, In Proceedings of DIMVA Conference (2007) (11 pages).
Jian Zhang et al., “Highly Predictive Blacklisting”, In Proceedings of the USENIX Security Symposium (2008) (16 pages).
http://www.uribl.com/about.shtml, retrieved from Internet Archive on Mar. 16, 2016, Archived Jul. 22, 2010 (4 pages).
http://www.spamhaus.org/zen/, retrieved from Internet Archive on Mar. 16, 2016, Archived Jul. 6, 2010 (3 pages).
Mathew Sullivan, “Fighting Spam by Finding and Listing Exploitable Servers”, Apricot 2006 (26 pages).
File History of U.S. Appl. No. 13/205,928, electronically captured from PAIR on Jul. 25, 2016 for Apr. 4, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/096,803, electronically captured from PAIR on Jul. 25, 2016 for Feb. 12, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/317,785, electronically captured from PAIR on Jul. 25, 2016 for Feb. 12, 2016 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/616,387, electronically captured from PAIR on Jul. 25, 2016 for Jun. 22, 2015 to Jul. 25, 2016.
File History of U.S. Appl. No. 14/668,329, electronically captured from PAIR on Jul. 25, 2016 for Jun. 22, 2015 to Jul. 25, 2016.
Mekky et al. (Detecting Malicious HTTP Redirections Using Trees of User Browser Activity, IEEE INFOCOM 2014, pp. 1159-1167).

Related Publications (1)

	Number	Date	Country
	20140101759 A1	Apr 2014	US

Provisional Applications (2)

	Number	Date	Country
	61292592	Jan 2010	US
	61295060	Jan 2010	US

Continuations (1)

	Number	Date	Country
Parent	12985140	Jan 2011	US
Child	14041796		US

Method and system for detecting malware

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Abstract