The present disclosure relates to electronic data network security.
Computer systems, networks and data centers are exposed to a constant and differing variety of attacks that expose vulnerabilities of such systems in order to compromise their security and/or operation. As an example, various forms of malicious software program attacks include viruses, worms, Trojan horses and the like that computer systems can obtain over a network such as the Internet. Quite often, users of such computer systems are not even aware that such malicious programs have been obtained within the computer system. Once resident within a given computer, a malicious program that executes might disrupt operation of that computer to a point of inoperability and/or might spread itself to other computers within a network or data center by exploiting vulnerabilities of the computer's operating system or resident application programs. Other malicious programs, such as “Spyware,” might operate within a computer to secretly extract and transmit information within the computer to remote computer systems for various suspect purposes.
To combat the proliferation of malicious software program attacks, Anti-Malware (AM) and Anti-Virus (AV) scanners have been widely deployed in different security appliances and Intrusion Prevention Systems (IPSs) to detect known malicious software, and to thereafter block or filter such threats. Compared with, e.g., universal resource locator (URL) and reputation-based malware filtering, which rely broadly on the source of content by monitoring, e.g., an IP address or a domain name, AM/AV scanning may enjoy a lower false positive rate by employing, e.g., well-defined signatures on the actual content that is responsible for malicious behavior. On the other hand, AM/AV scanning can be expensive in terms of central processing unit (CPU) and memory usage which, in many instances, especially as electronic data network traffic grows at increasing rates, makes it impractical to apply AM/AV scanning to all of the content being carried by network traffic, or even destined for a single given computer within that network.
Overview
Systems and methods are described that enable probabilistic application of data traffic scanning in an effort to catch malicious software or code being carried by the data traffic. The methodology and systems operate by monitoring data traffic in a data network via an interface with the electronic data network, calculating a first conditional probability that content in first given data traffic being monitored is malicious, calculating a second conditional probability that content in second given data traffic being monitored is malicious, ranking the first and second conditional probabilities resulting in ranked conditional probabilities, and performing at least one of anti-virus (AV) or anti-malware (AM) scanning of the content of the first or second given data traffic depending on whose conditional probability is ranked higher in the ranked conditional probabilities. In one embodiment, the functionality for computing or determining the conditional probabilities is embodied in computer executable instructions or code that is hosted by a network device or appliance, which can be deployed in one of several different locations in a network topology.
Embodiments described herein include a computer or other processing system configured to execute a probabilistic application of Anti-Malware/Anti-Virus (AM/AV) scanning. The number of security threats introduced by electronic network (e.g., world wide web) traffic is reaching epidemic proportions. Traditional gateway defenses are proving inadequate against a variety of malware, leaving corporate electronic data networks exposed. The speed, variety and damage potential of malware attacks highlight the utility of a robust, secure platform to protect the enterprise network perimeter or, at the very least, individual end point or end user devices within that perimeter.
Network 101 is also connected to an edge router 110 that couples network 101 to a wide area network (WAN) 115 such as the Internet and that allows communication between the end point devices 105 and other computers, servers and other devices worldwide. Such other computers, servers, etc. (not shown) may be disposed, for example, inside a data center 120 that may be a physical data center or a virtual data center that functions, from the perspective of an end point device 105, as a dedicated physical data center.
Also shown in
In addition, yet another device shown in computer architecture 100 is an administration device 140, which hosts AM/AV scanning administration logic 250. Administration device 140 may be a stand alone computer or device as shown or may be incorporated into any one of the network connected devices shown in
In the same way, although AM/AV scanners 125 shown in
The functionality of AM/AV scanning application logic 200 will now be described. At a high level, a main function of AM/AV scanning application logic 200 is to identify, based on probabilistic methodologies, which electronic data traffic is to be scanned by a given one of the AM/AV scanners 125, and where appropriate, to determine, in a probabilistic manner, which additional AM/AV scanner or scanners 125 may also be applied to the electronic data traffic after a given first one of the AM/AV scanners 125 has completed its scan operations.
A typical AM/AV scanner can accommodate approximately 100 scanning requests per second, i.e., a request to scan content carried by given electronic data traffic. In today's network environment, however, in order to effectively scan substantially all electronic data traffic, a scanner would have to support potentially thousands of requests per second. To address this problem, some scanning implementations scan only higher risk but lower volume file types, such as executable (.exe.) and zipped (.zip) file types. However, the trend is that malware and viruses are increasingly embedded in many other file types including hyper text markup language (HTML) files, JavaScript files, portable document format (PDF) documents and other image format files
To address the forgoing, AM/AV scanning application logic 200 provides a dynamic electronic data traffic scanning approach that determines which high-risk traffic to scan based on a probabilistic prioritization, selects which AM/AV scanner(s) 125 to user if there are two or more AM/AV scanners available for use, and monitors overall system load to dynamically scan more content when a given one or more of AM/AV scanners 125 may be idle. AM/AV scanner application logic 200 may also periodically cause AM/AV scanners 125 to scan random electronic data traffic, determine whether that traffic includes malicious software programs, and then based on results of the determination, provide a feedback mechanism by which future probabilistic prioritization techniques can be tuned or optimized.
More specifically, probabilistic prioritization, in accordance with principles of embodiments described herein, is based on conditional probability. In the case of identifying malicious software in electronic data traffic, the conditional probability of an event B (i.e., existence of malicious software programs) is the probability that the event will occur given the knowledge that an event A (i.e., selected characteristics of the electronic data traffic are present) has already occurred. This probability is written as P(B|A) and referred to as “the probability of B given A.” In the instant context, this would be referred to as the probability of finding malicious code or software in given electronic data traffic given the attributes or characteristics of that electronic data traffic.
If events A and B are not independent, then the probability of the intersection of A and B (the probability that both events occur) is defined by P(A and B)=P(A)P(B|A). From this definition, the conditional probability P(B|A) is obtained by dividing by P(A):
Event “A” in the context of characteristics of the electronic data traffic might include, for example, the URL from which the electronic data traffic is being received, and/or a corresponding reputation thereof, a category of the content being received (e.g., gambling pages, or cooking recipes), or type of file be carried by the electronic data traffic (e.g., .exe, .pdf, etc.), a user-agent type (e.g., an application used to request the content), geographical of either or both a server from which the electronic data traffic is being received or a user or end point device that is to receive the electronic data traffic, among many other possibilities.
In one implementation, AM/AV scanning application logic 200 has a set of a priori knowledge about the likelihood (probability) of given electronic data traffic carrying malicious software given certain characteristics of that given electronic data traffic. As the traffic passes through host 130, AM/AV scanning application logic 200 operates on or processes the traffic and generates a conditional probability that the traffic is carrying malicious software. It is noted that there may be multiple characteristics that AM/AV scanning application logic 200 considers in determining the conditional probability. As such, a final conditional probability that given electronic data traffic carries malicious software may be a combination of multiple conditional probabilities.
That is, rather than setting hard thresholds or limiting scanning of content carried by electronic data traffic to certain types of files, embodiments described herein rank the electronic data traffic based on a plurality of characteristics and resulting conditional probabilities. Scanning can then take place on the electronic data traffic (i.e., the content therein) with the highest rankings. In this way, scanner 125 can be put to use in the a more optimized way by passing content through the scanners that is determined to more likely carry malicious software.
Thus, in accordance with an embodiment, AM/AV scanning application logic 200 first selects an available and appropriate scanner 125 to employ for a first pass of scanning. Then, if another scanner might be available and it is determined that such a scanner might catch malicious software that the first scanner did not detect, then AM/AV scanning application logic 200 may further cause the electronic data traffic being processed to be processed by a second or even third, etc. scanner.
Still with reference to
With the additional scanner(s) so selected, the same electronic data traffic can be passed through the additional scanner(s) such that an overall catch rate of malicious code can be improved.
To keep the system and methodology relevant and updated over time, a periodic feedback mechanism is employed.
Referring to
At 712, it is determined whether malicious software has been found via the scanning procedure. If not, operation 710 is repeated. If malicious code is found, then at 714, relevant characteristics of the electronic data traffic are captured. These characteristics might include, e.g., a URL, the category of the content being carried or the type of file be carried, among many other possible characteristics. At 716 this information (namely that malicious software was (or was not) found in electronic data traffic having the captured characteristics) is broadcast or otherwise made available to other instances of AM/AV scanning application logic 200 that may be operating within other network architectures. In this way, different instances of AM/AV scanning application logic 200 can share information with one another, enabling individual instances to be as up to date as may be practicable so that the conditional probabilities that are calculated or as accurate as possible.
Reference has been made to portions or segments of electronic data traffic. In the context of the embodiments described herein, those segments or portions can be based on information gleaned from layers 3-7 of Open Systems Interconnection (OSI) model wherein the headers and payloads of various packets or frames of the electronic data traffic at these several layers are inspected for maliciousness and for the characteristics that are used to compute the conditional probabilities.
Thus, as explained, electronic data traffic transiting a security appliance or host is dynamically selected to be scanned based on a function of the probability that the traffic contains malicious software or code. In one embodiment, scanners are employed at their maximum throughput such that the greatest amount of electronic data traffic can be scanned without substantially impacting an expected quality of service by an end user. Scanners can also be used in sequence in an attempt to capture as much malicious code as possible. Periodically, a random sample of electronic data traffic is scanned by one or more scanners and information gleaned from such scans is distributed to network appliances that operate in accordance with the principles described herein.
Reference is now made to
Operation 810 includes monitoring electronic data traffic in an electronic data network via an electronic interface with the electronic data network. Operation 812 includes calculating a first conditional probability that content in first given electronic data traffic being monitored is malicious. Operation 814 includes calculating a second conditional probability that content in second given electronic data traffic being monitored is malicious.
Operation 816 includes ranking the first and second conditional probabilities resulting in ranked conditional probabilities, and operation 818 includes performing at least one of anti-virus (AV) or anti-malware (AM) scanning of the content of the first or second given electronic data traffic depending on whose conditional probability is ranked higher in the ranked conditional probabilities.
Operation 820 includes performing at least one of AV or AM scanning of content of third given electronic data traffic, determining, as a result of the AV or AM scanning, whether the content of the third given electronic data traffic is malicious, and when the content of the third given electronic data traffic is determined to be malicious, capturing characteristics of the third given electronic data traffic, wherein the characteristics are employed to calculate conditional probabilities that content of still other given electronic data traffic is malicious
Operations may further include selecting a first one of a plurality of scanners for performing the at least one of AV or AM scanning, wherein selecting is based on an efficacy value associated with respective ones of the plurality of scanners for a given type of content that is being carried by the first or second given electronic data traffic whose conditional probability is ranked higher in the ranked conditional probabilities.
Operations may still further comprise selecting a second one of the plurality of scanners for a performing a subsequent operation of performing at least one of AV or AM scanning after scanning by the first one of the plurality of scanners, wherein selecting of the second one of the plurality of scanners is based on a catch rate of malicious content not caught by the first one of the plurality of scanners.
In one embodiment, the functionality for calculating the conditional probabilities and controlling which scanners are to be employed for scanning may be deployed at a network security appliance that is logically disposed between an end-point device and a firewall that is in communication with the Internet. This functionality may also be deployed with a cloud computing environment.
Although the apparatus, system and method are illustrated and described herein as embodied in one or more specific examples, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the scope of the apparatus, system, and method and within the scope and range of equivalents of the claims. A Data Center can represent any location supporting capabilities enabling service delivery that are advertised. A Provider Edge Routing Node represents any system configured to receive, store or distribute advertised information as well as any system configured to route based on the same information. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the apparatus, system, and method, as set forth in the following.
Number | Name | Date | Kind |
---|---|---|---|
6851058 | Gartside | Feb 2005 | B1 |
7882560 | Kraemer et al. | Feb 2011 | B2 |
8370943 | Dongre et al. | Feb 2013 | B1 |
20120110667 | Zubrilin et al. | May 2012 | A1 |
20130291109 | Staniford et al. | Oct 2013 | A1 |
Entry |
---|
Anti-Virus Comparative, Mar. 2012, Retrieved from the Internet <URL: .av-comparatives.org/images/stories/test/ondret/avc—fd—mar2012—intl—en.pdf>, pp. 1-12 as printed. |
Oberheide, CloudAV: N-version Antivirus in the Network Cloud, 2008, Retrieved from the Internet <URL: jon.oberheide.org/files/usenix08-clouday.pdf>, pp. 1-16 as printed. |
Grover, “A fast quantum mechanical algorithm for database search”, Proceedings, STOC, 1996, pp. 1-8. |
Press, “Strong profiling is not mathematically optimal for discovering rare malfeasors”, Los Alamos National Laboratory, 2006, 18 pages. |
Montanaro, “Quantum search with advice”, Department of Computer Science, University of Bristol, Sep. 14, 2009, pp. 1-14. |
Hoogstrate et al., “Minimizing the average number of inspections for detecting rare items in finite populations”, 2011 IEEE Computer Society, pp. 203-208. |