This invention relates generally to enhancing the performance of malicious code detection methods for proxy server computers. More specifically, this invention relates to methods for selectively passing forward file contents that have previously been scanned for the presence of malicious code.
During the brief history of computers, system administrators and users have been plagued by attacking agents such as viruses, worms, and Trojan Horses, which are designed to disable host computer systems or propagate themselves to connected systems.
In recent years, two developments have increased the threat posed by these attacking agents. Firstly, increased dependence on computers to perform mission critical business tasks has increased the economic cost associated with system downtime. Secondly, increased interconnectivity among computers has made it possible for attacking agents to spread to a large number of systems in a matter of hours.
Many network systems employ proxy servers to provide additional protection against attacking agents. These proxy servers manage interaction such as HyperText Transport Protocol (HTTP) communications between client systems and outside systems. This manner of setup allows network administrators to control and monitor those sites which are accessed by users and institute an additional layer of protection by configuring the proxy server to scan any incoming files for infection by attacking agents.
However, this additional layer of protection can place significant performance demands on the proxy and greatly increase transmission latency. Many attacking agents can be detected only after a file has been fully downloaded. Detection of these agents typically requires access to non-sequential sections of the file as well as the ability to emulate the execution of the file and monitor its output. The time required to fully download a large file and scan it before beginning to transmit the file to a client can generate frustrating delays for users of client systems.
What is needed is a method for reducing the latency of files transmitted through scanning proxy servers.
The present invention comprises methods, systems, and computer readable media for managing transmission of a requested computer file (140) from a remote host computer (125) to a client computer (120). A proxy server computer (110) receives a first chunk (315) of the requested computer file (140). The proxy server (120) generates a hash of the chunk (315) and compares the hash to a hash of a chunk of previously downloaded file. If the two hashes are identical, the chunk (315) of the requested computer file (140) is passed to the client computer (120).
These and other more detailed and specific objects and features of the present invention are more fully disclosed in the following specification, reference being had to the accompanying drawings, in which:
The present invention comprises systems, methods, and computer readable media for verifying that a computer file 140 is free of malicious code before passing the computer file 140 to a client computer 120. As used herein, the term “malicious code” refers to any program, module, or piece of code that enters a computer without an authorized user's knowledge and/or without an authorized user's consent. The term “attacking agent” includes Trojan Horse programs, worms, viruses, and other such insidious software that insert malicious code into a computer file 140. An attacking agent may include the ability to replicate itself and compromise other computer systems.
The proxy server computer 110 manages the transfer of files 140 from the remote host computer 125 to the client computer 120. In one embodiment, the proxy server computer 110 shares an internal Local Area Network (LAN) or Wide Area Network (WAN) with the client computer 120, and controls all access between the client computer 120, and computers outside the internal network. In an alternate embodiment, the proxy server 110 communicates with the client computer 120 through the Internet and uses the proxy server computer 110 to provide an additional layer of security.
When the client computer 120 attempts to access a computer file 140 stored on the remote server 125, it transmits a request to the proxy server computer 110 that includes a Uniform Resource Locator (URL) for the computer file 140. The proxy server computer 110 receives the request and transmits a conventional file retrieval request to the remote host computer 125, which transmits the computer file 140 to the proxy server computer 110.
The proxy server computer 110 verifies that the computer file 140 is free of malicious code. The process of verifying that the computer file 140 is free of malicious code is described in greater detail with respect to
If the computer file 140 contains malicious code, the proxy server computer 110 blocks transmission of the file 140. Alternately, the proxy server computer 110 can remove the malicious code from the computer file 140 and transmit the cleaned file to the client computer 120. In one embodiment, the proxy server computer 110 maintains a cache of recently downloaded files to minimize bandwidth demands between the proxy server computer 110 and the remote host computer 125.
While in the present embodiment, the proxy server 110 interacts with an independent client computer system 120, in an alternate embodiment, the functions of the client computer system 120 can be performed by an application running on an enterprise server or any combination of software and hardware.
Additionally, while the embodiments disclosed below refer to a proxy server 120 which manages HTTP communications between client computers 120 and remote host computer 125, in alternate embodiments, the proxy server also manages File Transfer Protocol (FTP) communications and streaming media communication.
The processor 202 may be any specific or general-purpose processor such as an INTEL x86 or POWERPC-compatible central processing unit (CPU). The storage device 208 may be any device capable of holding large amounts of data, such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or some other form of fixed or removable storage device.
When the proxy server computer 110 transmits an HTTP request to the remote host computer 125, the remote host computer 125 begins to stream the computer file 140 to the proxy server computer 110.
When the proxy server computer 110 receives a first chunk 315 of the computer file 140, the security module 308 determines whether the chunk 315 is identical to a previously downloaded file chunk 315. The proxy server 110 checks in a hash database for a hash entry having an identifier that indicates that the hash stored in the hash entry is a hash of a chunk the same file and compares the newly downloaded hash to the stored hash. The organization of the hash database is described in greater detail with respect to
As used herein, a “hash” or “hash function” is a substantially collision free one-way function, from a variable sized input to a fixed size output. Normally, the output is smaller than the input. “One-way” means that it is easy to compute the output from the input, but computationally infeasible to compute the input from the output. “Substantially collision free” means that it is very difficult to find two or more inputs that hash to the same output. Examples of suitable hash functions usable in the present invention are MD5 and a CRC (Cyclic Redundancy Check) function.
The security module 310 then checks the hash table 325, which stores hashes of previously downloaded chunks, for a previously generated hash of the first chunk 315 of the requested computer file 140. The hash table 325 is stored in a local or remote cache and stores hashes of file chunks 315. The hash table 325 is periodically emptied when new threat definitions are made available to the security module 310. Alternately, the hash table 325 may be updated at regular intervals.
If the previously generated hash is present in the hash table 325, the security module 310 compares it to the hash of the first chunk of the requested computer file 140. If a hash corresponding to the new chunk 315 is not present or is not identical, the security module 310 downloads the full computer file 140 to the client computer 120. The security module 310 scans the computer file 140 for the presence of malicious code. If the computer file 140 contains malicious code, the security module 308 blocks transmission of the computer file 140. If the computer file 140 does not contain malicious code, the computer file 140 is transmitted to the client computer 120.
If the hashes are identical, the security module 310 passes the first chunk to the client computer 120. This process is repeated for each succeeding chunk until a chunk is received whose hash does not match the corresponding hash in the hash table 325, or the computer file 140 is fully transmitted to the client computer 120. This process is described in greater detail with respect to
The security module 310 includes a selection module 508. The selection module 508 is configured to compare a hash of a chunk 315 of a requested computer file 140 to a previously generated hash 425. If the two hashes are identical, the selection module 508 passes the chunk 315 to the client computer 120. If the two hashes are not identical, the selection module 508 holds the chunk 315 until the entire computer file 140 has been downloaded.
The security module 310 additionally includes a hash generator 504. The hash generator 504 is configured to generate hashes of chunks 315 of files 140 for comparison with previously generated hashes stored in the hash table 325.
The security module 310 further includes a scanning module 502. The scanning module 502 is configured to check a computer file 140 for the presence of malicious code. The scanning module 502 typically checks selected areas of a computer file 140 for distinct code sequences or other signature information. Alternately, the scanning module 502 may check the computer file 140 for distinctive characteristics, such as a particular size.
The scanning module 502 can additionally apply more complex detection techniques to a computer file 140. For example, the scanning module 502 can detect the presence of a polymorphic encrypted virus. A polymorphic encrypted virus (“polymorphic virus”) includes a decryption routine and an encrypted viral body. To avoid standard detection techniques, polymorphic viruses use decryption routines that are functionally the same for each infected computer file 140, but have different sequences of instructions. To detect these viruses, the scanning module 502 applies an algorithm that loads the executable computer file 140 into a software-based CPU emulator acting as a simulated virtual computer. The computer file 140 is allowed to execute freely within this virtual computer. If the executable computer file 140 does contain a polymorphic virus, the decryption routine is allowed to decrypt the viral body. The scanning module 502 detects the virus by searching through the virtual memory of the virtual computer for a signature from the decrypted viral body. The scanning module 502 may also be configured to detect metamorphic viruses, that, while not necessarily encrypted, also vary the instructions stored in the viral body.
Furthermore, the security module 310 includes an update module 506. The update module 506 is configured to update the hash table 325 after a computer file 140 has been scanned for the presence of malicious code. The update module 506 generates new entries in the hash table 325 for files lacking entries and updates hashes 435 for files 140 that already have entries 415 in the hash table 325.
If an identical hash does not appear in the hash table 325, the proxy server computer 110 allows the complete computer file 140 to download 620 to the proxy server 110. When the computer file 140 has been downloaded in its entirety, the scanning module 502 scans the computer file 140 to determine 625 whether the computer file 140 contains malicious code. If the computer file 140 is found by the scanning module 502 to contain malicious code, the selection module 508 blocks 627 transmission of the computer file 140 to the client computer. Alternately, the scanning module 502 can repair the computer file 140 and transmit the repaired computer file 140 to the client computer 120. In one embodiment, the repaired computer file 140 is cached on the proxy server computer 125. If a similarly infected file 140 is detected by the proxy server computer 125, it can transmit the cached repaired file 140 to the client computer 120, rather than repair the infected file 140.
If the computer file 140 is found by the scanning module 502 to be free of malicious code, the hash generator 604 generates a hash of all the constituent chunks 315 of the computer file 140 and stores them in new entries in the hash table 325. These hashes are stored for later comparison against future files that the proxy server computer 110 downloads at the request of the client computer 120. The computer file 140 is then transmitted 635 to the client computer 120. In an alternate embodiment, the hash generator generates new hashes of the chunks 315 of the computer file 140 as the chunks 315 are received, rather than generating the hashes after the file download is completed.
If a hash that is identical to the generated hash of the downloaded chunk 315 of the computer file 140 appears in the hash table 325, this means that the file 140 has likely been downloaded and scanned by the proxy server 110. Thus, the selection module 508 transmits 638 the chunk 315 to the client computer 120. The hash generator 504 then generates 640 a hash of the file chunk 315, and the selection module 508 compares 645 the hash to a corresponding hash in the hash table 325. If the hashes are different or if no corresponding hash exists in the hash table 325, the selection module 508 determines that the file 140 is not identical to a previously scanned file and ends 650 the download.
In an alternate embodiment, the selection module 508 permits the file 140 to download to the proxy server 110 as indicated in step 620. The scanning module 502 then scans 625 the file 140 for the presence of malicious code. If the file 140 contains malicious code, the scanning module 502 can cancel 627 the download the file 140 or clean the file 140 of malicious code and pass it to the client computer 120. If the file 140 does not contain malicious code, the selection module 508 passes 635 the file 140 to the client computer 120 and updates the associated hash entries 415 to store the hashes 425 of the new version of the file 140.
If more chunks 315 are determined 660 to remain in the computer file 140, steps 640, 645, 650, and 655 are repeated until the computer file 140 has been transmitted.
The above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention The scope of the invention is to be limited only by the following claims. From the above discussion, many variations will be apparent to one skilled in the relevant art that would yet be encompassed by the spirit and scope of the invention.
| Number | Name | Date | Kind |
|---|---|---|---|
| 5398196 | Chambers | Mar 1995 | A |
| 5440723 | Arnold et al. | Aug 1995 | A |
| 5452442 | Kephart | Sep 1995 | A |
| 5473769 | Cozza | Dec 1995 | A |
| 5572590 | Chess | Nov 1996 | A |
| 5675710 | Lewis | Oct 1997 | A |
| 5696822 | Nachenberg | Dec 1997 | A |
| 5715174 | Cotichini et al. | Feb 1998 | A |
| 5715464 | Crump et al. | Feb 1998 | A |
| 5758359 | Saxon | May 1998 | A |
| 5812763 | Teng | Sep 1998 | A |
| 5826249 | Skeirik | Oct 1998 | A |
| 5832208 | Chen et al. | Nov 1998 | A |
| 5854916 | Nachenberg | Dec 1998 | A |
| 5889943 | Ji et al. | Mar 1999 | A |
| 5949973 | Yarom | Sep 1999 | A |
| 5951698 | Chen et al. | Sep 1999 | A |
| 5956481 | Walsh et al. | Sep 1999 | A |
| 5960170 | Chen et al. | Sep 1999 | A |
| 5974549 | Golan | Oct 1999 | A |
| 5978917 | Chi | Nov 1999 | A |
| 5987610 | Franczek et al. | Nov 1999 | A |
| 6021510 | Nachenberg | Feb 2000 | A |
| 6023723 | McCormick et al. | Feb 2000 | A |
| 6052709 | Paul | Apr 2000 | A |
| 6070244 | Orchier et al. | May 2000 | A |
| 6072830 | Proctor et al. | Jun 2000 | A |
| 6072942 | Stockwell et al. | Jun 2000 | A |
| 6088803 | Tso et al. | Jul 2000 | A |
| 6092194 | Touboul | Jul 2000 | A |
| 6094731 | Waldin et al. | Jul 2000 | A |
| 6104872 | Kubota et al. | Aug 2000 | A |
| 6108799 | Boulay et al. | Aug 2000 | A |
| 6161130 | Horvitz et al. | Dec 2000 | A |
| 6167434 | Pang | Dec 2000 | A |
| 6192379 | Bekenn | Feb 2001 | B1 |
| 6199181 | Rechef et al. | Mar 2001 | B1 |
| 6253169 | Apte et al. | Jun 2001 | B1 |
| 6275938 | Bond et al. | Aug 2001 | B1 |
| 6298351 | Castelli et al. | Oct 2001 | B1 |
| 6338141 | Wells | Jan 2002 | B1 |
| 6347310 | Passera | Feb 2002 | B1 |
| 6357008 | Nachenberg | Mar 2002 | B1 |
| 6370526 | Agrawal et al. | Apr 2002 | B1 |
| 6370648 | Diep | Apr 2002 | B1 |
| 6397200 | Lynch et al. | May 2002 | B1 |
| 6397215 | Kreulen et al. | May 2002 | B1 |
| 6421709 | McCormick et al. | Jul 2002 | B1 |
| 6424960 | Lee et al. | Jul 2002 | B1 |
| 6442606 | Subbaroyan et al. | Aug 2002 | B1 |
| 6456991 | Srinivasa et al. | Sep 2002 | B1 |
| 6493007 | Pang | Dec 2002 | B1 |
| 6502082 | Toyama et al. | Dec 2002 | B1 |
| 6505167 | Horvitz et al. | Jan 2003 | B1 |
| 6535891 | Fisher et al. | Mar 2003 | B1 |
| 6552814 | Okimoto et al. | Apr 2003 | B2 |
| 6611925 | Spear | Aug 2003 | B1 |
| 6622150 | Kouznetsov et al. | Sep 2003 | B1 |
| 6678734 | Haatainen et al. | Jan 2004 | B1 |
| 6697950 | Ko | Feb 2004 | B1 |
| 6721721 | Bates et al. | Apr 2004 | B1 |
| 6748534 | Gryaznov et al. | Jun 2004 | B1 |
| 6763462 | Marsh | Jul 2004 | B1 |
| 6813712 | Luke | Nov 2004 | B1 |
| 6851057 | Nachenberg | Feb 2005 | B1 |
| 6910134 | Maher et al. | Jun 2005 | B1 |
| 20020004908 | Galea | Jan 2002 | A1 |
| 20020035693 | Eyres et al. | Mar 2002 | A1 |
| 20020035696 | Thacker | Mar 2002 | A1 |
| 20020038308 | Cappi | Mar 2002 | A1 |
| 20020046275 | Crosbie et al. | Apr 2002 | A1 |
| 20020073046 | David | Jun 2002 | A1 |
| 20020083175 | Afek et al. | Jun 2002 | A1 |
| 20020087649 | Horvitz | Jul 2002 | A1 |
| 20020091940 | Wellborn et al. | Jul 2002 | A1 |
| 20020138525 | Karadimitriou et al. | Sep 2002 | A1 |
| 20020147694 | Dempsey et al. | Oct 2002 | A1 |
| 20020147782 | Dimitrova et al. | Oct 2002 | A1 |
| 20020157008 | Radatti | Oct 2002 | A1 |
| 20020162015 | Tang | Oct 2002 | A1 |
| 20020178374 | Swimmer et al. | Nov 2002 | A1 |
| 20020178375 | Whittaker et al. | Nov 2002 | A1 |
| 20020194488 | Cormack et al. | Dec 2002 | A1 |
| 20020199186 | Ali et al. | Dec 2002 | A1 |
| 20020199194 | Ali | Dec 2002 | A1 |
| 20030023865 | Cowie et al. | Jan 2003 | A1 |
| 20030033587 | Ferguson et al. | Feb 2003 | A1 |
| 20030051026 | Carter et al. | Mar 2003 | A1 |
| 20030061487 | Angelo et al. | Mar 2003 | A1 |
| 20030065926 | Schultz et al. | Apr 2003 | A1 |
| 20030115458 | Song | Jun 2003 | A1 |
| 20030115479 | Edwards et al. | Jun 2003 | A1 |
| 20030115485 | Milliken | Jun 2003 | A1 |
| 20030120951 | Gartside et al. | Jun 2003 | A1 |
| 20030126449 | Kelly et al. | Jul 2003 | A1 |
| 20030140049 | Radatti | Jul 2003 | A1 |
| 20030191966 | Gleichauf | Oct 2003 | A1 |
| 20030212902 | van der Made | Nov 2003 | A1 |
| 20030236995 | Fretwell, Jr. | Dec 2003 | A1 |
| 20040015712 | Szor | Jan 2004 | A1 |
| 20040015726 | Szor | Jan 2004 | A1 |
| 20040030913 | Liang et al. | Feb 2004 | A1 |
| 20040039921 | Chuang | Feb 2004 | A1 |
| 20040158730 | Sarkar | Aug 2004 | A1 |
| 20040162808 | Margolus et al. | Aug 2004 | A1 |
| 20040162885 | Garg et al. | Aug 2004 | A1 |
| 20040181687 | Nachenberg et al. | Sep 2004 | A1 |
| 20050021740 | Bar et al. | Jan 2005 | A1 |
| 20050044406 | Stute | Feb 2005 | A1 |
| 20050132205 | Palliyil et al. | Jun 2005 | A1 |
| 20050177736 | De los Santos et al. | Aug 2005 | A1 |
| 20050204150 | Peikari | Sep 2005 | A1 |
| 20060064755 | Azadet et al. | Mar 2006 | A1 |
| Number | Date | Country |
|---|---|---|
| 100 21 686 | Nov 2001 | DE |
| 0636977 | Feb 1995 | EP |
| 1 280 039 | Jan 2003 | EP |
| 1408393 | Apr 2004 | EP |
| 2 364 142 | Jan 2002 | GB |
| WO 9325024 | Dec 1993 | WO |
| WO 9739399 | Oct 1997 | WO |
| WO 9915966 | Apr 1999 | WO |
| WO 0028420 | May 2000 | WO |
| WO 0137095 | May 2001 | WO |
| WO 0191403 | Nov 2001 | WO |
| WO 0205072 | Jan 2002 | WO |
| WO 0233525 | Apr 2002 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 20040181687 A1 | Sep 2004 | US |