The present invention is directed to embodiments of a new process for augmenting network traffic flow reports with domain name information.
Computing machines, such as gateway and/or network equipment (e.g., routers), are typically configured to export network flow reports. These reports include information regarding incoming/outgoing network traffic (i.e., Internet Protocol (“IP”) addresses) as it enters or exits the machine(s), and generally provide an overview of IP endpoints, as well as data rates (whether internal or external in relation to the local network) and the amount of data sent and received. The two most popular standards for network flow reports are Cisco NetFlow and IPFIX.
Enterprises, such as antivirus (AV) software providers, often utilize the reports to analyze and optimize bandwidth structure (e.g., user bandwidth usage patterns), conduct system issue investigations, and perform security assessments and/or identify anomalies. When assessing machine or network security, for example, these reports are usually used to detect intrusion attempts and infected hardware/software on a local network (e.g., for malicious agents, such as malware or viruses). Malware/command and control (C&C) host signatures databases or complex behavioral/machine learning analysis techniques can also be used to help identify these issues.
However, conventional reports (which are usually based on Internet Protocol version 4 [IPv4] and/or 6 [IPv6]) are generally unreliable for bandwidth optimization or security assessments, insofar as IP address to Domain Name System (DNS) resolution is concerned; these reports only indicate the destination IP addresses (consisting only of numbers and dots), where it is rather more useful to know the actual domain name(s) (e.g., www.avg.com) that users intended to access. The fact that user DNS queries and the actual connections that are subsequently made are not “linked” to one another, also complicates matters.
Reverse DNS querying is one existing approach to address this issue. But because DNS is dynamic and changes frequently (and also since DNS implements an aliasing technique, i.e., CNAME), this approach often fails to reveal all the domain names corresponding to reported IP addresses. For example, two consecutive requests for the same address may result in two different responses (i.e., due to load balancing); moreover, changes occur frequently without notice.
As an example, a NetFlow report on traffic from a desktop computer might include the following line item: 2016-02-26 32:15:32.434 1.030 TCP 192.168.0.1:42343->10.0.226.24:80 X XXXXX X. This line indicates outgoing traffic to a server having the IP address “10.0.226.24”. Reverse DNS querying this address might reveal the domain name “apps-build-prod-idc-ams001.mgm.avg.com”. However, an error message might appear if a web browser application is directed to access this domain. This could occur if the server actually serves two virtual hosts that are accessible under different domain names (e.g., jenkins.avg-labs.com and sonar.avg-labs.com) both pointing to “apps-build-prod-idc-ams001.mgm.avg.com” (note that DNS system allows referencing domain to domain). Thus, depending on which domain name is inputted to the web browser application, a different web application might be served from the same destination server machine.
As another example, as depicted in the NetFlow report of
In this example, two domain names might result: “evproc.com” and “li646-101.members.linode.com”. This is because the address “212.71.233.101” is used by a remote server for two different web applications—one for serving evproc.com (normal software) and another for serving hedgestash.com (harmful/phishing software). Depending on the DNS name used in the original request (for which traffic has been captured in the network flow report), the server will serve different web applications; it might, for example, serve evproc.com by default. If the original user web request was to access “hedgestash.com”, however, it would be difficult to determine this merely from conventional network flow reports. Existing network flow algorithms simply do not capture important parameters of connections (e.g., DNS name of host) for popular protocols, such as Hypertext Transfer Protocol (HTTP), Hypertext Transfer Protocol Secure (HTTPS), Simple Mail Transfer Protocol (SMTP), and the like. In fact, as described above, DNS is dynamic in nature. Thus, hedgestash.com may have existed only for a short time, after which it may disappear with little to no trace.
It would thus be beneficial to identify, for one or more line items in a network flow report, the original or actual DNS name used to access the destination resource(s)/server(s). This can be referred to as a “mapping” of DNS queries (made “at the moment of the request”) to network flows.
Generally speaking, it is an object of the present invention to enhance the operation of security applications and/or the analysis of network traffic flow reports during security assessments, by augmenting the reports with DNS information.
According to an exemplary embodiment of the present invention, a method for augmenting network traffic flow data with domain name service (“DNS”) information is provided. The method involves a networking device having at least one data processor, and includes monitoring DNS response traffic through a network, extracting at least one domain name record from the response traffic that corresponds to at least one domain name submitted in at least one web request, and providing the at least one domain name record for inclusion in the network traffic flow data.
Still other objects and advantages of the present invention will in part be obvious and will in part be apparent from the specification, and the scope of the invention will be indicated in the claims.
The present invention accordingly comprises the features of construction, combinations of elements, and arrangement of parts, and the various steps and the relation of one or more of such steps with respect to each of the others, all as exemplified in the constructions herein set forth, and the scope of the invention will be indicated in the claims.
The inventive embodiments are described in greater detail hereinafter with reference to the accompanying drawing figures, in which:
According to embodiments of the present invention, a system can augment network traffic flow reports (e.g., NetFlow or IPFIX reports) with original DNS queries information or context that are determined in real-time (e.g., as IPv4 and/or IPv6 connections occur), particularly when those queries/connection requests are made.
Process 350 can include extracting the IP address(es) from the packet (step 352) and analyzing the contents in the packet to determine if the packet corresponds to a TCP session (step 354). If the packet is for a TCP session, process 350 can include extracting the TCP session parameters (step 356) and determining whether the session is for a newly established connection (step 358). If the session is for a newly established connection, process 350 can include querying the DNS cache(s) with the extracted IP address (step 360). If a result to the query is available (step 362), process 350 can include querying the DNS cache(s) for the result (step 364), and proceeding to B to return to step 316 of process 300. In some embodiments, querying of the DNS cache for result(s) can be repeated, e.g., until the last result is retrieved. If there is no result available at step 362, process 350 can include creating a new entry in one or more network traffic flow reports or data (step 374)—for example, by adding time information, the IP address, and DNS name if available—and proceeding to C to return to step 316 of process 300.
Returning to step 354, if the packet is not for a TCP session, process 350 can include determining or checking the last time the IP address was active (step 368). If the last time the IP address was active a relatively long time ago (at step 370), process 350 can include closing the record for that IP address if it is open (step 372), proceeding to step 374, and continuing on the process therefrom as shown. On the other hand, if the last time the IP address was active was relatively recently (at step 370), process 350 can include updating traffic counters for that IP record (step 378) and determining whether the time of the record is older than a reporting period (step 380). If the time of the record is older than the reporting period, process 350 can include recreating the record (step 382) and proceeding to D to return to step 316 of process 300. If the time of the record is not older than the reporting period, process 350 can proceed to E to return directly to step 316 of process 300.
Returning to step 358, if the session is not for a newly established connection, process 350 can include determining whether the TCP session is closed (step 376). If the TCP session is closed, process 350 can proceed to step 372; otherwise, the process can proceed to step 378.
According to various embodiments, the system can be implemented as an algorithm, and more specifically, as an extension to network flow capture software (e.g., NetFlow). The algorithm can (i) enable inspection of DNS answer traffic [e.g., more deeply or concentrated than other data], (ii) push answer information into prioritized cache, (iii) mine or “travel” the cache in reverse order to recover original DNS name information used at or about the time of the requests, and (iv) add the recovered original DNS name information to the network flow report.
An example of a traffic line item from a network flow report augmented with original DNS name information is as follows: 2016-02-26 32:15:32.434 1.030 TCP 192.168.0.1:42343->10.0.226.24 (lenkins.avg-labs.com):80 X XXXXX X. An example of the prioritized DNS cache contents is as follows:
According to an exemplary embodiment, the system can generate network traffic flows and link connections (e.g., HTTP connections) revealed by the flows to relevant DNS names at or about the time the connections were made. In certain embodiments, the system can be implemented as a special DNS module that extends an existing flow capturing software application. The module can, for example, be configured to:
An example of a network flow report (e.g., augmented according to one or more of the processes shown in
It should be understood that the steps shown in processes 300, 350, 500, and 600 are merely illustrative and that existing steps may be modified or omitted, additional steps may be added, and the order of certain steps may be altered.
Accordingly, embodiments of the present invention advantageously provide network flows that include the original requested DNS names for some or all of the reported connection requests. This enables network analysis personnel, automation tools, or the like to optimize network bandwidth (e.g., for individual users) and identify network security issues. It is to be appreciated that, in certain embodiments, the augmented network flow reports can be useful for detecting malicious programs, such as unauthorized smartphone apps. The novel system described herein, including the supplementation of network flows with DNS names from cache, can overcome the disadvantages of existing DNS caching solutions, which do not effect grouping by individual hosts.
It should be understood that the foregoing subject matter may be embodied as devices, systems, methods and/or computer program products. Accordingly, some or all of the subject matter may be embodied in hardware and/or in software (including firmware, resident software, micro-code, state machines, gate arrays, etc.). Moreover, the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. A computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-usable or computer-readable medium may be for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Computer-readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technology that can be used to store information and that can be accessed by an instruction execution system.
Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media (wired or wireless). A modulated data signal can be defined as a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
When the subject matter is embodied in the general context of computer-executable instructions, the embodiment may comprise program modules, executed by one or more systems, computers, or other devices. Generally, program modules include routines, programs, objects, components, data structures and the like, which perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Those of ordinary skill in the art will understand that the term “Internet” used herein refers to a collection of computer networks (public and/or private) that are linked together by a set of standard protocols (such as TCP/IP and HTTP) to form a global, distributed network. While this term is intended to refer to what is now commonly known as the Internet, it is also intended to encompass variations that may be made in the future, including changes and additions to existing protocols.
It will thus be seen that the objects set forth above, among those made apparent from the preceding description and the accompanying drawings, are efficiently attained and, since certain changes can be made in carrying out the above methods and in the constructions set forth for the systems without departing from the spirit and scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
It is also to be understood that the following claims are intended to cover all of the generic and specific features of the invention herein described, and all statements of the scope of the invention, which, as a matter of language, might be said to fall therebetween.
This application claims the benefit of U.S. Provisional Patent Application No. 62/346,170, filed on Jun. 6, 2016, the disclosure of which is hereby incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62346170 | Jun 2016 | US |