The present application is related to commonly owned and assigned application Ser. No. 11/462,943, entitled S
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates to computer management. In particular, but not by way of limitation, the present invention relates to systems and methods for detecting and removing pestware.
Personal computers and business computers are continually attacked by trojans, spyware, and adware, collectively referred to as “malware” or “pestware.” These types of programs generally act to gather information about a person or organization-often without the person or organization's knowledge. Some pestware is highly malicious. Other pestware is non-malicious but may cause issues with privacy or system performance. And yet other pestware is actually beneficial or wanted by the user. Wanted pestware is sometimes not characterized as “pestware” or “spyware.” But, unless specified otherwise, “pestware” as used herein refers to any program that collects and/or reports information about a person or an organization and any “watcher processes” related to the pestware.
Software is available to detect and remove some pestware, but many types of pestware are difficult to detect with typical techniques. For example, pestware may be obfuscated with encryption techniques so that a pestware file stored on a system hard drive may not be readily recognizable as a file that has spawned a pestware process. In yet other instances, pestware is known to be polymorphic in nature so as to change its code, data, size and/or its starting address in memory. In yet other instances, variants of known pestware are developed that alter relatively little of the functional aspects of the pestware, yet render the pestware undetectable.
Although present pestware-detection systems detect some or even most pestware, they are not sufficiently accurate or otherwise satisfactory. Accordingly, a system and method are needed to address the shortfalls of present technology and to provide other new and innovative features.
Exemplary embodiments of the present invention that are shown in the drawings are summarized below. These and other embodiments are more fully described in the Detailed Description section. It is to be understood, however, that there is no intention to limit the invention to the forms described in this Summary of the Invention or in the Detailed Description. One skilled in the art can recognize that there are numerous modifications, equivalents and alternative constructions that fall within the spirit and scope of the invention as expressed in the claims.
The present invention can provide a system and method for defining and detecting pestware. One embodiment includes receiving a file and placing at least a portion of the file into a processor-readable memory of a computer. A plurality of execution paths within code of the pestware file are followed and particular instructions within the execution paths are identified. A representation of the relative locations of each of the particular instructions within the code of the file are compared against a pestware-definition file so as to determine whether the file is a potential pestware file.
As previously stated, the above-described embodiments and implementations are for illustration purposes only. Numerous other embodiments, implementations, and details of the invention are easily recognized by those of skill in the art from the following descriptions and claims.
Various objects and advantages and a more complete understanding of the present invention are apparent and more readily appreciated by reference to the following Detailed Description and to the appended claims when taken in conjunction with the accompanying Drawings wherein:
Referring now to the drawings, where like or similar elements are designated with identical reference numerals throughout the several views. Referring first to
As shown, N protected computers 1021-N are coupled to a host 104 via a network 106 (e.g., the Internet). The host 104 in this embodiment includes a threat research portion 108 and a code-graph definition engine 110. Also depicted are data storage devices 112, 114 that include collected threat data 112 and code-graph-based definitions 114. The term “protected computer” is used herein to refer to any type of computer system, including personal computers, handheld computers, servers, firewalls, etc.
In accordance with several embodiments, the threat research portion 108 identifies and stores pestware threats in the threat database 112. The threat research portion 108 may, for example, actively search for pestware using bots that scour the Web for potential pestware. In addition, one or more of the N protected computers 1021-N may provide data, via the network 106, about potential pestware to the threat research portion 108.
The code-graph definition engine 110 in this embodiment is configured to retrieve the collected pestware threats from the threat database 112 and generate code-graph-based definitions that are stored in the definition database 114. An update service 116 then makes the code-graph-based definitions available to the computers 1021-N. The illustrated arrangement of these components is logical and not meant to be an actual hardware diagram or a detailed architecture of an actual software implementation. Thus, the components can be combined or further separated in an actual implementation. Moreover, in light of this specification, the construction of each individual component is well-known to those of skill in the art.
As discussed further herein, using code-graph-based pestware definitions provides several advantages over known pestware detection methodologies. In general, the code-graph-based definitions include a collection of data that is dependent upon the overall functionality of the pestware files so that minor variations to a pestware file do not render the pestware undetectable. In many embodiments for example, the code-graph-based definitions include data that is dependent upon occurrences of one or more types of calls as well as data that is dependent upon where, in the structure of the code, the occurrences take place.
In some embodiments for example, the code-graph-based definitions may include both data that captures the occurrences of one or more types of system calls and data that captures the connectedness of the system calls in the pestware file. In other embodiments, the code-graph-based definitions may include data that is dependent upon one or more parameters of function calls (e.g., system calls) and data that is dependent upon an order of the function calls. In yet other embodiments, the code-graph-data may include data that is dependent upon particular sequences of code and the connectedness of the particular pieces of code.
Notably, if the code-graph-based definitions are based upon function calls, the identity of each function call (e.g., system call) need not be captured in the call-graph-based definitions in order for the call-graph-based definitions to provide a useful definition of the pestware. This is in contrast to known pestware detection techniques, which parse through files to locate commands which are compared with a listing of operations that are known to be potentially dangerous operations. In other words, instead of analyzing a file to determine if it includes commands that carry out operations known to be dangerous, in many embodiments of the present invention, files are analyzed based upon the occurrence of function calls irrespective of the functions associated with the functions calls.
Referring next to
As shown, the file storage device 206 provides storage for a collection files which includes a suspect file 208 (e.g., received via the network 106 from a URL) and code-graph-based definitions 210 received from the update service 116 of the host 104. The file storage device 206 is described herein in several implementations as hard disk drive for convenience, but this is certainly not required, and one of ordinary skill in the art will recognize that other storage media may be utilized without departing from the scope of the present invention. In addition, one of ordinary skill in the art will recognize that the storage device 206, which is depicted for convenience as a single storage device, may be realized by multiple (e.g., distributed) storage devices.
As shown, an anti-spyware application 220 includes a detection module 222, a removal module 224, and a reporting module 226 which are implemented in software and are executed from the memory 204 by the processor 202. In addition, suspect-process code 228, which corresponds to the suspect file 208, is also depicted in memory 204.
The anti-spyware application 214 can be configured to operate on personal computers (e.g., handheld, notebook or desktop), servers or any device capable of processing instructions embodied in executable code. Moreover, one of ordinary skill in the art will recognize that alternative embodiments, which implement one or more components in hardware, are well within the scope of the present invention. It should be recognized that the illustrated arrangement of these components is logical and not meant to be an actual hardware diagram or a detailed architecture of an actual software implementation. Thus, the components can be combined or further separated in an actual implementation. Moreover, in light of this specification, the construction of each individual component is well-known to those of skill in the art.
Also shown within the detection module 222 are a code-graph engine 230 and a comparison module 232. In the exemplary embodiment, the code-graph engine 230 is configured to generate a code graph of the suspect code 228 and the comparison module 232 is configured to compare the code graph with the code-graph-based definitions 210 to assess whether the suspect code 228 is likely pestware code. Depending upon the results of the comparison carried out by the comparison module, the suspect file 208 and code 228 are removed and/or a user of the computer 200 is notified about the likelihood the suspect file 208 is a pestware file.
The configuration of the code-graph engine 230 may vary depending upon the type of code graph that the code-graph based definitions are based upon. For example, if the code-graph definition engine 110 generates code-graph-based definitions that include a representation of system calls and the relative locations of the system calls for each pestware file, then the code-graph engine 230 may be configured to generate the same type of representation of system calls along with information that captures the relative locations of the system calls so that the code-graph generated by the code-graph engine 230 is comparable with the code-graph-based definitions 210.
Referring next to
As shown in
Once code of the pestware file has been retrieved, a plurality of potential-execution paths within the code are followed (Block 308), and particular instructions within the execution paths are identified (Block 310). For example, starting with an entry point of the code from the pestware file, the code may be followed until there is a conditional jump in the code, which separates the path into two paths. Each of the separate paths is then followed, and if each of the separate paths splits into additional paths, then each of the additional paths is also followed.
While following each of the potential execution paths, particular instructions are identified (Block 310). In some embodiments, the identified instructions are function calls that are made in the code. For example, system calls may be identified within the execution paths of the code. In other embodiments, the identified instructions may be a particular sequence of instructions that are identified in the code. In other embodiments, the identified instructions can be function calls to addresses to portions of the processor-readable memory that are outside of the memory occupied by the code of the pestware file. It is contemplated, however, that one or more other types of code or code sequences may be identified and used to characterize the pestware file.
In embodiments where system calls are identified, instructions that are not jumps or conditional jumps may be ignored, and calls to addresses made within the code of the pestware may be assumed to be non-system calls and also ignored.
As shown in
In one embodiment, if function calls (e.g., system calls) are identified, a representation of the address of the function call is stored in connection with information that connects each function call with other function calls. As an example, the representation of the address may be the address itself, a check sum, or a hash of the address, and the information connecting the function calls may be information that relates the function calls to one another by the paths in the code where the function calls occur. It should be recognized that using an address of each system call is merely one way of attaching an identifier to each call. Moreover, the actual system functionality associated with each function calls need not be known.
Although the function associated with each function in many embodiments is not determined, it beneficial in these embodiments to attach an identifier to the function calls so that if a call is repeated, there is a way of recognizing and tracking the number of times a particular function call is made. It is contemplated, for example, that the repetition of particular function calls as well as the order in which function calls (e.g., system calls) are made in pestware code may be used to construct a definition for the pestware.
In some embodiments, the relative locations of the occurrences of particular instructions (e.g., function calls) are assembled as a tree-shaped graph in the pestware-definition file that is characterized by branches that include the particular instructions (e.g., function calls), and nodes that correspond to conditional jumps within the code. To simplify the tree, and hence the quantity of data associated with the tree, branches that do not include the particular instructions (e.g., system calls) may be ignored.
It has been found that, even when the branches that do not contain the particular instructions are ignored, comparing a graph-based pestware definition with a graph generated from a suspect file (e.g., the suspect file) may be a processor-intensive process. As a consequence, in many variations the graph is simplified by removing cycles in the tree-shaped graph to create a simplified tree. Although data is missing, it has been found that graph-based pestware definitions may be simplified in this manner and yet be effective to identify pestware.
The extent to which the graph is simplified may vary depending upon factors including the accuracy desired, the processing capabilities of the computer and/or the desired rate at which files are scanned. Although certainly not required, it has been found that a graph may be simplified so that it is linear representation of the order in which occurrences of the particular instructions occur. For example, the graph may be a linear call graph that includes data that defines an order in which system calls are made.
In some instances, pestware is designed to include conditional jumps and/or function calls that include dynamic addresses. For example, pestware may be designed so that an address is loaded into a register and a jump instruction then jumps to the value in the register. As a consequence, in some embodiments when the graph is assembled, instructions that precede the jump or call are emulated to determine the value of the register. In this way, more call and jump destinations may be determined and a more complete graph may be assembled.
As depicted in
From the perspective of a protected computer, when a file is received at the protected computer (e.g., via the network communication module 212 or portable media), at least a portion of the file is placed in processor-readable memory (e.g., memory 204) of the computer (Blocks 322, 324). Once in memory, a plurality of execution paths within the code are followed, and particular instructions within the execution paths are identified (Blocks 326, 328). As will be appreciated by one of ordinary skill in the art, the manner in which the steps depicted by Blocks 326 and 328 is carried out may vary, but these steps are dependent upon how the pestware definition file is generated at Blocks 308-310. For example, if system calls are identified in Block 310, then system calls are also identified in Block 328.
As shown in
In embodiments where the particular instructions are function calls (e.g., system calls), the relative locations of function calls found within the code of the analyzed file is compared against the relative locations of the function calls in the pestware-definition file. In some of these embodiments a comparison of locations of identifiers of the function calls of the analyzed file and the pestware-definition file is made. As discussed, the manner in which each function call is represented may be arbitrary in that each function call may be given an identifier that may or may not connote the actual function associated with the function call.
When comparing locations of each of the particular instructions (Block 330), in many embodiments the longest matching sequence of particular instructions between the pestware-definition file and the analyzed file is found. If the particular instructions are systems calls for example, the longest matching sequence of system calls between the pestware file and the analyzed file is found.
Beneficially, comparing where particular instructions occur makes it more difficult for producers of pestware to effectively disguise pestware with minor alterations. Specifically, due to time and cost considerations, pestware producers are more likely to make alterations that affect how pestware code appears, but not how the pestware code operates. And the order in which particular instructions occur (e.g., function calls) is determined by how the code operates. As a consequence, unless a pestware file is substantially altered so that the functionality of the pestware is altered, the pestware detection techniques described herein remain effective.
It should be recognized that the comparison between the pestware-definition file and the file being analyzed may generate substantially less than a 100 percent match, and yet, provide a strong indication that the analyzed file is a pestware file. For example, in many instances pestware producers are more inclined to add functionality to their pestware offerings. And when adding new functionality, the existing core functionality is often left in place. As a consequence, if the additional functionality corresponds to 30 percent of the function calls in an enhanced pestware file, a match between 60 percent of the function calls of the enhanced pestware file and a pestware-definition based upon the original pestware file strongly suggests that the enhanced pestware file is indeed pestware.
Similarly, if a portion of an original pestware file is removed or replaced, there may be substantially less than 100 percent match between the pestware-definition file and the enhanced pestware file. But if the remaining portion of original pestware file is a substantial portion of the enhanced file (e.g., the remaining portion includes 60 percent of the function calls of the enhanced file) there may still be enough matches (e.g., 50 percent) between the original pestware-definition and the enhanced pestware file to at least render the analyzed file a potential pestware file.
As a consequence, in some embodiments a weighting scheme is used in connection with the type of match found between the pestware-definition file and the analyzed file. For example, a greater weight may be applied to a particular percentage of matching function calls when certain function calls in the analyzed file are missing as compared to the same percentage of matching function calls when certain function calls have been replaced with other function calls.
In addition, it is contemplated that, based upon the extent the pestware-definition matches the analyzed file, the pestware file may be quarantined, removed or a user of the computer may be informed about the likelihood that the analyzed file is a pestware file.
In conclusion, the present invention provides, among other things, a system and method for defining and detecting pestware. Those skilled in the art can readily recognize that numerous variations and substitutions may be made in the invention, its use and its configuration to achieve substantially the same results as achieved by the embodiments described herein. Accordingly, there is no intention to limit the invention to the disclosed exemplary forms. Many variations, modifications and alternative constructions fall within the scope and spirit of the disclosed invention as expressed in the claims.
Number | Name | Date | Kind |
---|---|---|---|
5623600 | Ji et al. | Apr 1997 | A |
5745725 | Simpson | Apr 1998 | A |
5920696 | Brandt et al. | Jul 1999 | A |
5951698 | Chen et al. | Sep 1999 | A |
6069628 | Farry et al. | May 2000 | A |
6070009 | Dean et al. | May 2000 | A |
6073241 | Rosenberg et al. | Jun 2000 | A |
6092194 | Touboul | Jul 2000 | A |
6154844 | Touboul | Nov 2000 | A |
6167520 | Touboul | Dec 2000 | A |
6226787 | Serra et al. | May 2001 | B1 |
6310630 | Kulkarni et al. | Oct 2001 | B1 |
6397264 | Stasnick et al. | May 2002 | B1 |
6405316 | Krishnan et al. | Jun 2002 | B1 |
6460060 | Maddalozzo, Jr. et al. | Oct 2002 | B1 |
6480962 | Touboul | Nov 2002 | B1 |
6535931 | Celi, Jr. | Mar 2003 | B1 |
6611878 | De Armas et al. | Aug 2003 | B2 |
6633835 | Moran et al. | Oct 2003 | B1 |
6667751 | Wynn et al. | Dec 2003 | B1 |
6701441 | Balasubramaniam et al. | Mar 2004 | B1 |
6772345 | Shetty | Aug 2004 | B1 |
6785732 | Bates et al. | Aug 2004 | B1 |
6804780 | Touboul | Oct 2004 | B1 |
6813711 | Dimenstein | Nov 2004 | B1 |
6829654 | Jungck | Dec 2004 | B1 |
6910134 | Maher | Jun 2005 | B1 |
6965968 | Touboul | Nov 2005 | B1 |
6966059 | Shetty | Nov 2005 | B1 |
7058822 | Edery et al. | Jun 2006 | B2 |
7107617 | Hursey et al. | Sep 2006 | B2 |
7111290 | Yates et al. | Sep 2006 | B1 |
7210168 | Hursey | Apr 2007 | B2 |
7346611 | Burtscher | Mar 2008 | B2 |
7349931 | Horne | Mar 2008 | B2 |
7353505 | O'Dowd | Apr 2008 | B2 |
7392543 | Szor | Jun 2008 | B2 |
7437718 | Fournet et al. | Oct 2008 | B2 |
7490352 | Kramer et al. | Feb 2009 | B2 |
7565695 | Burtscher | Jul 2009 | B2 |
7707635 | Kuo et al. | Apr 2010 | B1 |
7810091 | Gartside et al. | Oct 2010 | B2 |
20020066080 | O'Dowd | May 2002 | A1 |
20020162015 | Tang | Oct 2002 | A1 |
20020166063 | Lachman et al. | Nov 2002 | A1 |
20030065943 | Geis et al. | Apr 2003 | A1 |
20030074581 | Hursey et al. | Apr 2003 | A1 |
20030101381 | Mateev et al. | May 2003 | A1 |
20030159070 | Mayer et al. | Aug 2003 | A1 |
20030192033 | Gartside et al. | Oct 2003 | A1 |
20030217287 | Kruglenko | Nov 2003 | A1 |
20040030914 | Kelley et al. | Feb 2004 | A1 |
20040034794 | Mayer et al. | Feb 2004 | A1 |
20040064736 | Obrecht et al. | Apr 2004 | A1 |
20040080529 | Wojcik | Apr 2004 | A1 |
20040143763 | Radatti | Jul 2004 | A1 |
20040187023 | Alagna et al. | Sep 2004 | A1 |
20040199763 | Freund | Oct 2004 | A1 |
20040225877 | Huang | Nov 2004 | A1 |
20040255165 | Szor | Dec 2004 | A1 |
20050021994 | Barton et al. | Jan 2005 | A1 |
20050027686 | Shipp | Feb 2005 | A1 |
20050038697 | Aaron | Feb 2005 | A1 |
20050091558 | Chess | Apr 2005 | A1 |
20050138433 | Linetsky | Jun 2005 | A1 |
20050154885 | Viscomi et al. | Jul 2005 | A1 |
20050177868 | Kwan | Aug 2005 | A1 |
20060074896 | Thomas | Apr 2006 | A1 |
20060075494 | Bertman | Apr 2006 | A1 |
20060075500 | Bertman | Apr 2006 | A1 |
20060075501 | Thomas | Apr 2006 | A1 |
20060080637 | Treit | Apr 2006 | A1 |
20060085528 | Thomas | Apr 2006 | A1 |
20060095895 | K. | May 2006 | A1 |
20060101282 | Costea et al. | May 2006 | A1 |
20060161988 | Costea | Jul 2006 | A1 |
20060230290 | Burtscher | Oct 2006 | A1 |
20060230451 | Kramer et al. | Oct 2006 | A1 |
20060236397 | Horne | Oct 2006 | A1 |
20060272021 | Marinescu et al. | Nov 2006 | A1 |
20070006311 | Barton et al. | Jan 2007 | A1 |
20070055711 | Polyakov et al. | Mar 2007 | A1 |
20070074188 | Huang et al. | Mar 2007 | A1 |
20070101435 | Konanka et al. | May 2007 | A1 |
20070168992 | Bates | Jul 2007 | A1 |
20070180509 | Swartz et al. | Aug 2007 | A1 |
20070250928 | Boney | Oct 2007 | A1 |
20070300303 | Greene et al. | Dec 2007 | A1 |
20080034430 | Burtscher | Feb 2008 | A1 |
20090198994 | Tan | Aug 2009 | A1 |
Number | Date | Country |
---|---|---|
2007007326 | Jan 2007 | WO |
PCTUS2006008883 | Oct 2007 | WO |
PCTUS2006041798 | Dec 2007 | WO |
Number | Date | Country | |
---|---|---|---|
20080052679 A1 | Feb 2008 | US |