The disclosure relates generally to a method and system for detecting botnets.
Blacklists are well known and generally act as an access list to a computer network. Thus, email addresses, users, passwords, URLs, IP addresses, domain names, file hashes, etc can be on a blacklist for a particular company and will not be allowed to access over the computer network of the company. Many commercial anti-virus products may include a blacklist.
Network traffic flow analysis of computer network is also well known. Traffic flow analysis is the analysis of the flow of digital data as it travels from one node (a source address) to another node (a destination address.) Network traffic flow here includes NetFlow, DNS cache, DNS sync hole traffic flow, etc. These analysis has been used to detect malware and the like.
Botnets are also known and consist of a plurality of computer systems that are working in a coordinated manner. Botnets can exist for legal purposes, but are often use for nefarious purposes in which each computer resource of the botnet may be infected with malicious code.
None of the existing malware and virus detection systems use both blacklists and Network traffic flow analysis data to recursively detect botnets.
The disclosure is particularly applicable to a malware detection system that incorporates the network traffic flow botnet detection and method and it is in this context that the disclosure will be described. It will be appreciated, however, that the system and method has greater utility since the network traffic flow botnet detection and method may receive the network traffic flow data from other sources and may operate as a stand-alone system.
In the network shown in
Each C&C device 10 may include command and control (C&C) infrastructure consisting of servers and other technical infrastructure used to control malware in general, and, in particular, botnets. Command and control devices 10 may be either directly controlled by the malware operators, or themselves run on hardware compromised by malware.
A zombie node may be a computer connected to the Internet that has been compromised by a hacker, computer virus or trojan horse and can be used to perform malicious tasks of one sort or another under remote direction. Botnets of zombie computers are often used to spread e-mail spam and launch denial-of-service attacks. Most owners of zombie computers are unaware that their system is being used in this way. Because the owner tends to be unaware, these computers are metaphorically compared to zombies. A coordinated DDoS attack by multiple botnet machines also resembles a zombie horde attack. A non-zombie node is a node that is not a zombie node that thus not comprised by the hacker, computer virus or trojan horse.
The botnet detection system 23 may be implemented in hardware and/or software. The botnet detection system 23 may include a botnet detector 27, a white-list generator 28 and a white-list scorer 29. Each of these elements 27-29 may be implemented using hardware or software. When each element is implemented in hardware, it may be an FPGA, programmed microcontroller, microprocessor, state machine and the like and may perform the operations and functions described below with reference to
As described in more detail below, the botnet detection system 23 may receive network traffic flow data such as netflow data, DNS data, etc as described below (including IP addresses for known botnets) and is able to detect botnet and a malicious IP address as detailed below in
Thus, as shown in
The method may then performs a union operation (∩ as shown in
The method may then determine if the number of MatchedFlow.src_ip corresponding to a MatchedFlow.dst_ip is more than a predetermined number M (310). M is a threshold which indicates a number of victim nodes communicating with a command and control (“C&C”) server. The value of M is 3 in this example shown in
The method may then match each netflow destination IP address to the known C&C server IP address (318) to generate a set difference between the each netflow destination IP address and the known C&C server IP address. The method may then identify the matching netflow IP addresses (320) as shown in
The method may then generate a list of NetFlow IP addresses that does not match the white-list (326) which are new C&C server IP addresses in this example 5.5.5.5 that those IP addresses may correspond to the Level 2 IP addresses shown in
The method may begin with an initial set of nodes C=A∪B. The method may receive a set of NetFlow data (402) wherein each piece of NetFlow data includes a source IP address src_ip and a destination IP address dst_ip as shown in
Once the numbers of zombie nodes and non-zombie nodes are calculated, the method may calculate a score of maliciousness unknown IP address, wherein the score may be equal to, in one embodiment, score=1−M/(M+N) (610). In the example in
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.
The system and method disclosed herein may be implemented via one or more components, systems, servers, appliances, other subcomponents, or distributed between such elements. When implemented as a system, such systems may include an/or involve, inter alia, components such as software modules, general-purpose CPU, RAM, etc. found in general-purpose computers. In implementations where the innovations reside on a server, such a server may include or involve components such as CPU, RAM, etc., such as those found in general-purpose computers.
Additionally, the system and method herein may be achieved via implementations with disparate or entirely different software, hardware and/or firmware components, beyond that set forth above. With regard to such other components (e.g., software, processing components, etc.) and/or computer-readable media associated with or embodying the present inventions, for example, aspects of the innovations herein may be implemented consistent with numerous general purpose or special purpose computing systems or configurations. Various exemplary computing systems, environments, and/or configurations that may be suitable for use with the innovations herein may include, but are not limited to: software or other components within or embodied on personal computers, servers or server computing devices such as routing/connectivity components, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, consumer electronic devices, network PCs, other existing computer platforms, distributed computing environments that include one or more of the above systems or devices, etc.
In some instances, aspects of the system and method may be achieved via or performed by logic and/or logic instructions including program modules, executed in association with such components or circuitry, for example. In general, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular instructions herein. The inventions may also be practiced in the context of distributed software, computer, or circuit settings where circuitry is connected via communication buses, circuitry or links. In distributed settings, control/instructions may occur from both local and remote computer storage media including memory storage devices.
The software, circuitry and components herein may also include and/or utilize one or more type of computer readable media. Computer readable media can be any available media that is resident on, associable with, or can be accessed by such circuits and/or computing components. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and can accessed by computing component. Communication media may comprise computer readable instructions, data structures, program modules and/or other components. Further, communication media may include wired media such as a wired network or direct-wired connection, however no media of any such type herein includes transitory media. Combinations of the any of the above are also included within the scope of computer readable media.
In the present description, the terms component, module, device, etc. may refer to any type of logical or functional software elements, circuits, blocks and/or processes that may be implemented in a variety of ways. For example, the functions of various circuits and/or blocks can be combined with one another into any other number of modules. Each module may even be implemented as a software program stored on a tangible memory (e.g., random access memory, read only memory, CD-ROM memory, hard disk drive, etc.) to be read by a central processing unit to implement the functions of the innovations herein. Or, the modules can comprise programming instructions transmitted to a general purpose computer or to processing/graphics hardware via a transmission carrier wave. Also, the modules can be implemented as hardware logic circuitry implementing the functions encompassed by the innovations herein. Finally, the modules can be implemented using special purpose instructions (SIMD instructions), field programmable logic arrays or any mix thereof which provides the desired level performance and cost.
As disclosed herein, features consistent with the disclosure may be implemented via computer-hardware, software and/or firmware. For example, the systems and methods disclosed herein may be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Further, while some of the disclosed implementations describe specific hardware components, systems and methods consistent with the innovations herein may be implemented with any combination of hardware, software and/or firmware. Moreover, the above-noted features and other aspects and principles of the innovations herein may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various routines, processes and/or operations according to the invention or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the invention, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.
Aspects of the method and system described herein, such as the logic, may also be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (“PLDs”), such as field programmable gate arrays (“FPGAs”), programmable array logic (“PAL”) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (“MOSFET”) technologies like complementary metal-oxide semiconductor (“CMOS”), bipolar technologies like emitter-coupled logic (“ECL”), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.
It should also be noted that the various logic and/or functions disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) though again does not include transitory media. Unless the context clearly requires otherwise, throughout the description, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
Although certain presently preferred implementations of the invention have been specifically described herein, it will be apparent to those skilled in the art to which the invention pertains that variations and modifications of the various implementations shown and described herein may be made without departing from the spirit and scope of the invention. Accordingly, it is intended that the invention be limited only to the extent required by the applicable rules of law.
While the foregoing has been with reference to a particular embodiment of the disclosure, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the disclosure, the scope of which is defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
7610624 | Brothers | Oct 2009 | B1 |
7912698 | Statnikov et al. | Mar 2011 | B2 |
8135718 | Das et al. | Mar 2012 | B1 |
8418249 | Nuycci | Apr 2013 | B1 |
8762298 | Ranjan | Jun 2014 | B1 |
9680855 | Schultz | Jun 2017 | B2 |
20020052858 | Goldman et al. | May 2002 | A1 |
20020138492 | Kil | Sep 2002 | A1 |
20040128535 | Cheng | Jul 2004 | A1 |
20060037080 | Maloof | Feb 2006 | A1 |
20080098479 | O'Rourke | Apr 2008 | A1 |
20080148398 | Mezack | Jun 2008 | A1 |
20080220740 | Shatzkamer | Sep 2008 | A1 |
20080276317 | Chandola | Nov 2008 | A1 |
20080279387 | Gassoway | Nov 2008 | A1 |
20080294019 | Tran | Nov 2008 | A1 |
20090028141 | Vu Duong et al. | Jan 2009 | A1 |
20090254992 | Schultz et al. | Oct 2009 | A1 |
20100007489 | Misra et al. | Jan 2010 | A1 |
20100183211 | Meetz et al. | Jul 2010 | A1 |
20100201489 | Griffin | Aug 2010 | A1 |
20110179492 | Markopoulou | Jul 2011 | A1 |
20110299420 | Waggener et al. | Dec 2011 | A1 |
20120167210 | Oro Garcia | Jun 2012 | A1 |
20130111036 | Ozawa et al. | May 2013 | A1 |
20130247205 | Schrecker | Sep 2013 | A1 |
20140122370 | Jamal et al. | May 2014 | A1 |
20140136846 | Kitze | May 2014 | A1 |
20140137257 | Martinez | May 2014 | A1 |
20140189873 | Elder | Jul 2014 | A1 |
20140201374 | Ashwood-Smith | Jul 2014 | A1 |
20140219096 | Rabie et al. | Aug 2014 | A1 |
20140237599 | Gertner | Aug 2014 | A1 |
20140259170 | Amsler | Sep 2014 | A1 |
20140317261 | Shatzkamer et al. | Oct 2014 | A1 |
20140317293 | Shatzkamer | Oct 2014 | A1 |
20140325231 | Hook et al. | Oct 2014 | A1 |
20150019710 | Shaashua et al. | Jan 2015 | A1 |
20150033340 | Giokas | Jan 2015 | A1 |
20150074807 | Turbin | Mar 2015 | A1 |
20150082308 | Kiess et al. | Mar 2015 | A1 |
20150163242 | Laidlaw | Jun 2015 | A1 |
20150227964 | Yan et al. | Aug 2015 | A1 |
20150288767 | Fargano et al. | Oct 2015 | A1 |
20150317169 | Sinha et al. | Nov 2015 | A1 |
20150326535 | Rao et al. | Nov 2015 | A1 |
20150333979 | Schwengler et al. | Nov 2015 | A1 |
20150381423 | Xiang | Dec 2015 | A1 |
20150381649 | Schultz | Dec 2015 | A1 |
20160006642 | Chang et al. | Jan 2016 | A1 |
20160050161 | Da et al. | Feb 2016 | A1 |
20160057234 | Parikh et al. | Feb 2016 | A1 |
20160154960 | Sharma et al. | Jun 2016 | A1 |
20160205106 | Yacoub et al. | Jul 2016 | A1 |
20160248805 | Burns et al. | Aug 2016 | A1 |
20160301704 | Hassanzadeh | Oct 2016 | A1 |
20160301709 | Hassanzadeh | Oct 2016 | A1 |
20160364553 | Smith | Dec 2016 | A1 |
20170149804 | Kolbitsch | May 2017 | A1 |
Entry |
---|
Auto-WEKA webpage printed regarding algorithms (2 pages) (Chris Thornton et al.) Feb. 17, 2015. |
Ayat, N.E.; Cheriet, M.; Suen, C.Y.; “Automatic Model Selection for the optimization of SVM Kernels,” Mar. 21, 2005 (35 pages). |
Brodley, Carla E., “Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection,” (1993) (8 pages). |
Chapelle, Olivier; Vapnik, Vladimir; Bousquet, Olivier; Mukherjee, Sayan; “Choosing Multiple Parameters for Support Vector Machines,” Machine Learning, 46, 131-159, 2002 © 2002 Kluwer Academic Publishers (29 pages). |
Lee, Jen-Hao and Lin, Chih-Jen, “Automatic Model Selection for Support Vector Machines, pp. 1-16” (2000). |
Smith, Michael R.; Mitchell, Logan; Giraud-Carrier, Christophe; Martinez, Tony; “Recommending Learning Algorithms and Their Associated Hyperparameters,” Jul. 7, 2014 (2 pages). |
Thornton, Chris. Thesis: “Auto-WEKA: Combined Selection and Hyperparameter Optimization of Supervised Maching Learning Algorithms,” Submitted to the University of British Columbia, Mar. 2014 (75 pages). |
Thornton, Chris; Hutter, Frank; Hoos, Holger H.; Leyton-Brown, Kevin. “Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms,” Mar. 2013 (9 pages). |
Wolinski, Christophe; Kuchcinski, Krzysztof. “Automatic Selection of Application-Specific Reconfigurable Processor Extensions.” Design, Automation & Test in Europe Conference (Date '08), Mar. 2008, Munich, Germany, pp. 1214-1219 (7 pages). |
Workshop Handout edited by Joaquin Vanschoren, Pavel Brazdil, Carlos Soares and Lars Kotthoff, “Meta-Learning and Algorithm Selection Workshop at ECAI 2014,” MetaSel 2014, Aug. 19, 2014 (66 pages). |
H. Larochelle et al. “An empirical evaluation of deep architectures on problems with many factors of variation” ACM ICML '07, pp. 473-480 (8 pgs). |
J. Bergstra et al. “Random Search for Hyper-Parameter Optimization” Journal of Machine Learning Research 13 (2012), p. 281-305 (25 pgs). |
Boyen-X, et al.,—Identity-Based Cryptography Standard (IBCS) #1: Supersingular Curve Implementations of the BF and BB1 Cryptosystems, dated Dec. 2007—WEBPAGE https://tools.ietf.org/html/rfc5091 (64 pgs.). |
Stouffer, K. et al.,—“The National Institute of Standards & Technology(NIST) Industrial Control System (ICS) security guide” dated May 2015 (247 pgs.). |
Chih-Fong, T. et al., Intrusion Detection by Machine Learning: A Review: dated 2009; pp. 11994-12000 (11 pgs.). |
Number | Date | Country | |
---|---|---|---|
20170374084 A1 | Dec 2017 | US |