The present disclosure relates to domain name system (“DNS”) traffic analysis. More particularly, the present disclosure relates to systems and methods for detecting anomalies in DNS traffic based on DNS lookup data.
In one embodiment, there may be provided a method for analyzing DNS lookup data, comprising: calculating a plurality of traffic scores for a network address based on a set of DNS lookup data associated with the network address, wherein the set of DNS lookup data includes a plurality of query records having one or more queried network addresses; calculating a first variance and a second variance for the network address based on the plurality of traffic scores for the network address; and determining a rank of the network address based on the first and second variances.
In another embodiment, there may be provided a system for analyzing DNS lookup data. The system comprises a processor and a memory communicatively coupled to the processor. The processor can be configured to: calculate a plurality of traffic scores for a network address based on a set of DNS lookup data associated with the network address, wherein the set of DNS lookup data includes a plurality of query records having one or more queried network addresses; calculate a first variance and a second variance for the network address based on the plurality of traffic scores for the network address; and determine a rank of the network address based on the first and second variances.
Additional objects and advantages of the embodiments of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the embodiments. The objects and advantages of the embodiments will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, in connection with the description, illustrate various embodiments and exemplary aspects of the disclosed embodiments. In the drawings:
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When appropriate, the same reference numbers are used throughout the drawings to refer to the same or like parts.
For simplicity and illustrative purposes, the principles of the present teachings are described by referring mainly to exemplary embodiments thereof. However, one of ordinary skill in the art would readily recognize that the same principles are equally applicable to, and can be implemented in, all types of information and systems, and that any such variations do not depart from the true spirit and scope of the present teachings. Moreover, in the following detailed description, references are made to the accompanying figures, which illustrate specific exemplary embodiments. Electrical, mechanical, logical and structural changes may be made to the exemplary embodiments without departing from the spirit and scope of the present teachings. The following detailed description is, therefore, not to be taken in a limiting sense and the scope of the present teachings is defined by the appended claims and their equivalents.
The Domain Name System (“DNS”), the Internet's lookup service for mapping domain names to Internet Protocol (“IP”) addresses, provides a critical infrastructure for Internet applications. The prevalence of DNS lookups can help network operators discover valuable information about the nature of domains that are being looked up. The term “DNS lookup data” may include data related to domain name resolution at different levels of DNS hierarchy. Top-level domain (“TLD”) DNS lookup data may also include domain name queries submitted by Recursive Name Servers (“RNSs”). The source IP addresses from the domain name queries can be aggregated to characterize network traffic that was interested in at least one queried domain during an observed window. The IP addresses corresponding to the RNSs requesting the at least one queried domain can further be aggregated into/24 subnets if it is deemed appropriate to mitigate against double counting queries out of the same network segment. Therefore, DNS lookup data may include query records showing the relationship between the queried domain name and the queriers.
DNS traffic anomalies may be detected by analyzing DNS lookup data. DNS traffic anomalies may indicate that a particular domain or an entity associated with that particular domain has launched a new product that is garnering a lot of attention, aired a commercial directing traffic to that particular domain, or possibly engaged in more malicious activities like spamming, phishing, malware, or involvement in a botnet. For more information on the detection of malicious domains by analyzing DNS lookup patterns, see Shuang Hao et al., “An Internet-Wide View Into DNS Lookup Patterns,” published June 2010 by VeriSign Incorporated, the entire contents of which are expressly incorporated herein by reference.
In particular, domains associated with scams and botnets may exhibit more churn in terms of the networks that look them up from day to day. Therefore, by analyzing DNS lookup patterns, domains exhibiting anomalous behaviors may be identified, the anomalous behaviors not being limited to malicious activities or other behaviors deemed harmful. In addition, domains exhibiting similar spatial lookup patterns may also exhibit other similarities. Results from prior efforts to determine anomalous behavior may be used to generate groups of domains and filter them based on known behaviors of their neighbors. Moreover, a domain that is newly registered and exhibits unusual traffic may be categorized as having suspicious behaviors. Finally, blacklisted domains may typically be queried by a much wider range of subnets, particularly for newly registered domains.
Alternatively or additionally, traffic processor 104 may calculate a geolocation percentage associated with a target domain based on DNS lookup data. For example, traffic processor may analyze the network addresses of the queriers who query a particular network address, and obtain geolocation information based on the queriers' network addresses and/or geoIP maps. Traffic processor 104 may then group DNS traffic based on the geolocation information, and calculate a percentage for traffic originated from different geolocations. In some embodiments, the geolocation percentage may be calculated for a group of network addresses, such as IP addresses in the same/24 subnet (i.e., class C subnet). Traffic processor 104 may store the geolocation percentage in a database 108.
In the example shown in
Traffic processor 104 may update the daily traffic scores by calculating a moving average. For example, assuming the daily traffic scores shown in data block 204 are results of a first week's operation, when a new week begins (e.g., Sunday), the Sunday traffic score, e.g., 21 of subnet “12.34.56.xx,” may be updated by taking into account a current traffic score, e.g., 25, calculated from DNS lookup data of this Sunday. Specifically, the updated Sunday score may be calculated as a moving average, as follows:
Updated score=(previous score*number of weeks+current score)/(number of weeks+1).
In the above equation, the “number of weeks” refers to number of scored weeks associated with the previous daily traffic score. Therefore, using the data discussed above, the updated Sunday traffic score would be (21*1+25)/(1+1)=23. If the traffic score of next Sunday is 14, then the next updated Sunday score would be (23*2+14)/(2+1)=20.
Similar calculations may be performed for each day of a week. As a result, a moving average of traffic score for each day of a week can be obtained. As the number of weeks increase, the moving average may exhibit valuable patterns that can be used to determine DNS traffic anomalies. It should be noted that the data structure 200 shown in
Traffic processor 104 may update the geolocation percentage periodically. For example, the update may be performed daily, by calculating simple average of the previous day's percentage and the current day's percentage. In addition, locations that fail to appear the next day may be removed from data block 304. New locations may also be added to data block 304 if they have appeared for, e.g., two days. Of course, other time periods may be used in the calculation. Similar to the daily traffic score, normalization may be performed based on various considerations. It should be noted that the data structure 300 shown in
Referring back to
In some embodiment, traffic score and/or geolocation percentage may be modified based on DNS traffic variations. Such variations may include, for example, holidays. In a holiday, especially when the holiday is also a weekday, DNS traffic may exhibit a different pattern from a non-holiday weekday. Other incidences that may not be considered as anomalies include topical trending. For example, when a company releases a new product, the company's domain name may receive larger than normal queries. In another example, when a pharmaceutical company advances to a new clinical trial stage, larger than normal traffic may also be expected. Such incidences that relate to one or more “topics,” can be referred as “topical trending.” The system and method disclosed herein may modify traffic scores and/or geolocation percentages resulting from such topical trending.
In the foregoing descriptions, various aspects, steps, or components are grouped together in a single embodiment for purposes of illustrations. The disclosure is not to be interpreted as requiring all of the disclosed variations for the claimed subject matter. The following claims are incorporated into this Description of the Exemplary Embodiments, with each claim standing on its own as a separate embodiment of the invention.
While the teachings has been described with reference to the exemplary embodiments thereof, those skilled in the art will be able to make various modifications to the described embodiments without departing from the true spirit and scope. The terms and descriptions used herein are set forth by way of illustration only and are not meant as limitations. In particular, although the method has been described by examples, the steps of the method may be performed in a different order than illustrated or simultaneously. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and the claims, such terms are intended to be inclusive in a manner similar to the term “comprising.” As used herein, the term “one or more of” with respect to a listing of items such as, for example, A and B, means A alone, B alone, or A and B. Those skilled in the art will recognize that these and other variations are possible within the spirit and scope as defined in the following claims and their equivalents.
This application claims the benefit of, and priority from U.S. Provisional Patent Application Ser. No. 61/557,255, filed Nov. 8, 2011, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6684250 | Anderson et al. | Jan 2004 | B2 |
7472201 | Aitken | Dec 2008 | B1 |
7523016 | Surdulescu et al. | Apr 2009 | B1 |
7626940 | Jain | Dec 2009 | B2 |
7926108 | Rand et al. | Apr 2011 | B2 |
8260914 | Ranjan | Sep 2012 | B1 |
8261351 | Thornewell et al. | Sep 2012 | B1 |
8347394 | Lee | Jan 2013 | B1 |
8516104 | Liu et al. | Aug 2013 | B1 |
8646064 | Holloway et al. | Feb 2014 | B1 |
8676964 | Gopalan et al. | Mar 2014 | B2 |
20010034637 | Lin et al. | Oct 2001 | A1 |
20020032717 | Malan et al. | Mar 2002 | A1 |
20040205374 | Poletto et al. | Oct 2004 | A1 |
20040221190 | Roletto et al. | Nov 2004 | A1 |
20080028463 | Dagon et al. | Jan 2008 | A1 |
20080043620 | Ye | Feb 2008 | A1 |
20090180391 | Petersen et al. | Jul 2009 | A1 |
20100037314 | Perdisci et al. | Feb 2010 | A1 |
20100100957 | Graham et al. | Apr 2010 | A1 |
20100115617 | Weber et al. | May 2010 | A1 |
20100138919 | Peng et al. | Jun 2010 | A1 |
20100257266 | Holmes et al. | Oct 2010 | A1 |
20100274970 | Treuhaft et al. | Oct 2010 | A1 |
20100290346 | Barford et al. | Nov 2010 | A1 |
20110119226 | Ruhl et al. | May 2011 | A1 |
20110191455 | Gardner | Aug 2011 | A1 |
20110283357 | Pandrangi et al. | Nov 2011 | A1 |
20110314145 | Raleigh et al. | Dec 2011 | A1 |
20120159623 | Choi | Jun 2012 | A1 |
20120173710 | Rodriguez | Jul 2012 | A1 |
20120233311 | Parker et al. | Sep 2012 | A1 |
20120246290 | Kagan | Sep 2012 | A1 |
Number | Date | Country |
---|---|---|
2007050244 | May 2007 | WO |
Entry |
---|
Roberto Perdisci et al., “Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces”, Mar. 29, 2012, pp. 1-10 http://mmnet.iis.sinica.edu.tw/botnet/file/20100329/20100329—3.pdf, accessed Nov. 8, 2012. |
European Search Report, dated Mar. 11, 2013, European Application No. 12191876.7, filed Nov. 8, 2012, pp. 1-4, published by the European Patent Office. |
Number | Date | Country | |
---|---|---|---|
20130117282 A1 | May 2013 | US |
Number | Date | Country | |
---|---|---|---|
61557255 | Nov 2011 | US |