The present invention relates to systems for combating spyware on computers and in particular to a system that may automatically detect and generate signatures for unknown spyware.
Spyware are programs that run on computers without the knowledge or permission of a user and which steal sensitive or private information from the user and forward that information to a remote site. Examples of spyware are keyloggers which capture a user's keystrokes, tracking software which monitor the user's destination on the web, screen scrapers which pull data from the user's display screen, and Trojans which download and install other spyware. Some spyware masquerades as benign computer programs intended to provide useful functionality, such as browser plug-ins and extensions.
The stolen information obtained by spyware can be used for criminal activity, for example, if financial information or passwords are stolen. Increasingly, spyware is used to target unwanted advertising to the user, triggered for example, by the user's browsing activity.
Unlike other malware, such as viruses, spyware is intended to remain hidden on the computer. This very characteristic makes it difficult to detect spyware; a recent study has reported that as many as 80% of computers are spyware infected.
Current techniques for spyware detection use “signatures” of known spyware, for example character strings found in the binary executables of the spyware or found in network traffic produced by the spyware. Detecting spyware is done by analyzing the application programs on the computer and/or monitoring network communications for matches to the signatures.
Generating signatures for this approach is a time-consuming manual process. Because signatures are normally developed on a post hoc basis, this technique is principally effective against known spyware for which a signature has been developed, and is relatively ineffective against new or unknown spyware.
The present invention automatically detects both known and unknown spyware by monitoring deviations from normal network activity when a computer is subjected to a set of test “user” inputs. New outgoing network packets that carry information about the user (for example information from the test user inputs) and/or that provide information to an unknown remote server, are a strong indication of a spyware infection. When spyware is discovered, a warning may be provided to the user. In addition, the outgoing network packets produced by the spyware, identified by this process, may be used to simply and automatically generate signatures of the spyware for use by other computers.
Specifically, the present invention provides a method of detecting spyware comprising the steps of identifying a set of standard output packets generated by a “clean” computer in response to a given set of user inputs. These same user inputs are then applied to an “unknown” computer and differences between the standard output packets and the output packets of the “unknown” computer are identified. Based on these differences, likelihood that the unknown computer is infected with spyware is assessed.
It is thus one feature of at least one embodiment of the invention to provide an automatic method of detecting unknown spyware based on behavior rather than signatures. It is another feature of at least one embodiment of the invention to provide a simple and reliable method to distinguish normal browser behavior from spyware behavior.
The invention may determine whether the differences in output packets include output packets addressed to an unknown server.
It is thus another feature of at least one embodiment of the invention to eliminate false positives, for example, resulting from minor modification of benign web sites used in developing the standard output packets.
The invention may determine whether the differences in output packets include output packets that have data correlated with the given set of user inputs.
It is another feature of at least one embodiment of the invention to provide a detection system that is well suited to identify a fundamental characteristic of spyware of sending out user derived information.
The invention may assess a threat level based on both whether the output packets from the unknown computer include addresses of an unknown server and whether the data is correlated with the given set of user inputs.
It is therefore another feature of at least one embodiment of the invention to provide for a multilevel ranking of the probability that a given program is spyware to allow tailoring of the detection process to the requirements of a user.
The user inputs may be automatically generated and input to the computer by a program running on the computer.
It is another feature of at least one embodiment of the invention to provide for automatic testing for spyware without user intervention.
The given set of user inputs may be selected from a set of common server addresses.
It is another feature of at least one embodiment of the invention to provide benchmark user inputs that are commonly used and to which spyware is likely to be sensitive.
The given set of user inputs may be selected in part by analyzing executable programs on the computer for web addresses.
It is a feature of at least one embodiment of the invention to tailor the user input to spyware already on the user's system.
As used herein, the “clean” computer having a known clean state and the “unknown” computer having an unknown state may be implemented as different computer hardware, or may be the same computer hardware executing the same program at different times, or the same computer hardware executing two independent instances of a program.
It is thus another feature of at least one embodiment of the invention to provide a system that may readily be used on an individual computer or multiple computers with arbitrary hardware and software configurations.
A “clean” and “unknown” computer, for example, may be implemented as two browser programs executing on the same computer hardware, where one browser is a standard browser, susceptible to spyware, and the other browser is configured not to accept browser plug ins.
It is thus another feature of at least one embodiment of the invention to provide a system that may be used on a continuous basis, on a single machine, to analyze and detect possible spyware infection. In this case, the standard user inputs may be any inputs by the actual user.
Alternatively, the standard user inputs may be developed on different computer hardware initialized with the same software as the “unknown” computer and having a known clean state.
It is therefore another feature of at least one embodiment of the invention to provide a system that may be used by a computer manufacturer for a standard line of computers manufactured by that manufacturer.
The invention may further include the step of extracting a signature from the differences between the standard output packets and the output packets of the “unknown” computer and providing signatures to a monitoring program.
It is thus another aspect of the invention to provide a system that may automatically generate spyware signatures for use with network intrusion detection devices and the like.
The signature may be a longest common subsequence of the differences.
It is another feature of at least one embodiment of the invention to provide a signature generating mechanism that makes use of the differential analysis already used by the present invention in detecting spyware behavior.
The steps of the invention may be repeated periodically, or may be repeated upon a loading of new programs into the computer of unknown state.
It is another feature of at least one embodiment of the invention to provide a system that may operate in the background without user intervention.
It is another feature of at least one embodiment of the invention to provide a system that does not require access to a computer that is wholly free from spyware.
These particular features and advantages may describe only some embodiments falling within the claims and thus do not define the scope of the invention.
Referring now to
The network 10 may further include a network intrusion detection system (NIDS) 22 attached to the network line 16 to monitor network traffic to detect malware, including spyware viruses and the like. The NIDS 22 may hold a number of signatures 24 of different types of malware including viruses and spyware and the like and may, for example, be a computer running a program such as “Snort”, an open source intrusion detection/prevention system available at http://www.snort.org, or “Bro”, an intrusion detection system available at http://bro-ids.org.
The present invention may be implemented by programs 26 running on one or more of the computers 20a-20d. In a first implementation, the program 26 runs on a single computer 20d to detect spyware infecting the computer 20d and to provide corresponding signatures 24 by a signature transfer path 28 to the NIDS 22. In this embodiment, the program 26 may alternatively or in addition notify the operator of the computer 20d of the presence of spyware via warning signal 68, for example transmitted to a local or remote monitoring terminal 29.
In a second embodiment, the program 26 runs on computers 20b and 20c. In this mode, the computer 20c provides data about normal computer operation (to be described below) via connection 30 to computer 20b used by that computer 20b in the detection of spyware on computer 20b and/or the generation of signatures or warning signals.
In a third embodiment, the program 26 operates solely on computer 20a and provides two instances of a program, such as a browser, one instance providing data about normal computer operation, and one instance susceptible to spyware infection and under continual supervision. In this embodiment, as will be described below, the outputs of the program instances are compared to detect spyware.
Referring now to
The operating system 32 may also provide for an Internet interface 40 to network connections 18 or the like also by means of an API.
The interfaces 34 and 40 provide a simple mechanism for application programs 42 to communicate with external hardware and devices. In this case, the application programs 42 may be a browser 44 such as the Internet Explorer browser manufactured by Microsoft. Such a browser 44 may permit one or more plug-ins 46 to enhance or customize the operation of the browser 44 and may also harbor spyware. The program 26 of the present invention may also be an application program 42 with communication via API calls with the interfaces 34 and 40.
Referring still to
Referring now to
The user inputs 53 of the test input set 52 are first applied to a clean version of the application program 42 to be tested, where the clean version of the application program 42 is ideally known to be free from spyware and on a machine that is free from spyware. This process may be conducted on a single computer 20d, for example when it is first commissioned, or on a separate machine for example computer 20c being maintained in a pristine state.
The user inputs 53 are provided through interface 34 to the browser 44 which produces output packets 51 through interface 40 that are recorded in a standard behavior table 48 by the program 26. Generally multiple sets of packets 51 are collected for each set of user inputs 53. Referring to the following Table 1 a user input 53 of www.google.com may produce to output packets 51 for standard behavior table 48 corresponding to a request for data from the Google web site and a request for an image embedded in the main page data of the accessed Google web site. This process of generating standard behavior table 48 may be done as infrequently as once.
Note that each test input set 52 will normally include multiple user inputs 53 for different remote server sites and one or more user inputs 53 for each remote server site.
At a subsequent time on the same computer 20d (in the first embodiment) or on a different unknown computer 20b (in the second embodiment) the same user inputs 53 may be applied through network interface 34′ to new application program 42′ for example being a possibly infected browser 44′ on a new computer 20c or the same browser 44 at a later time on computer 20d. The browser 44′ represents any application program 42 with an unknown state with respect to spyware infection and, in response to the test input set 52, produces through interface 40′ output packets 51 that are collected in an actual behavior table 50 shown in the following Table 2.
Generally, as shown, the actual behavior table 50 may include additional output packets 51 beyond those invoked on the clean machine. In this case, those output packets include captured browsing behavior (in the form of URL's) sent to a spyware server and include a URL of the spyware server (not shown in the table).
Using the data of the standard behavior table 48 and the actual behavior table 50, the program 26 then compares the corresponding output packets of standard behavior table 48 to the actual behavior table 50 for each entry of the user inputs 53 to identify those packets of actual behavior table 50 that are not standard responses as shown by the corresponding record of standard behavior table 48. In this case the packets directed to the spyware site (e.g., GET/...&theurl=http://slashdot.org) are identified as a set of nonstandard packets 54.
The program 26 individually analyzes each set of nonstandard packets 54 with respect to server addresses 56 to which data will be sent. These server addresses 56 are compared by address matcher 58 to the server names found in the output packets 51 of the standard behavior table 48. Information indicating a server address 56 is “unknown”, that is, not found in the standard behavior table 48 is sent to a spyware threat assessor 60 as will be described below.
The packets of each set of nonstandard packet 54 are also analyzed with respect to the user inputs 53 that evoked the set of nonstandard packets 54 by correlator 62 to determine whether there is a correlation between the user inputs 53 and the data 57 being conveyed by the set of nonstandard packets 54 to a remote site. Such correlation would tend to indicate that private user information is being embedded in an outgoing packet. The results of this comparison are also provided to the spyware threat assessor 60.
For many spyware types, the user inputs 53 correlated by the correlator 62 with the data 57 of the set of nonstandard packets 54 may be the most recent user inputs 53. This short time window of comparison is possible because of a motivation of the designers of some types of spyware to react immediately to user inputs 53 for the delivery of advertisements targeted to the user inputs 53. Nevertheless, the time window of user inputs 53 need not be so limited, and previous user inputs 53 for an arbitrary time window may be considered.
Multiple sets of nonstandard packets 54 associated with different user inputs 53 (for example www.apple.com and www.google.com) are then compared against each other to identify the longest common subsequence among the multiple set of nonstandard packets 54. This longest common subsequence is extracted as a potential signature 64 and provided to the spyware threat assessor 60.
The spyware threat assessor 60 operates according to the following Table 3 to output a signature 24 along signature transfer path 28 and/or to notify the user that there is a spyware infection as indicated by warning output 66 depending on the analysis of information from address matcher 58 and correlator 62.
Spyware is most likely and thus a highest score is assigned to situations where the remote server address 56 is unknown and user inputs 53 may be correlated to the data 57 of the packets 54. A likely rating is provided if there is an unknown server address but the correlation between data 57 and user inputs 53 cannot be easily made. This second case covers spyware that may, for example, encrypt the data it is sending out from an infected machine. Finally it is least likely that there is a spyware infection if the remote server address 56 is recognized. In this case it is immaterial whether user inputs 53 correlate to data 57. The user may select any score level to trigger a warning output 66 and/or a signature output over signature transfer path 28 depending on a desired level of security.
Referring now to
Spyware detection program 26 is incorporated into the application program 42 to continuously receive inputs and outputs from both the standard browser 44 and the known clean browser 44′ that serve to provide the data of standard behavior table 48 and actual behavior table 50, respectively. With the possibility of continuous real-time operation, program 26 may provide an immediate warning of spyware behavior through warning output 66. Over time, multiple novel packets 54 may be collected to extract a signature that may also be forwarded to another machine.
Referring now to
Referring now to
It is specifically intended that the present invention not be limited to the embodiments and illustrations contained herein, but include modified forms of those embodiments including portions of the embodiments and combinations of elements of different embodiments as come within the scope of the following claims. For the purpose of the claims, the term “computer” should be considered to refer not only to a unique processor but also to multiple processors sharing execution of a single task in a distributed processing environment. Likewise multiple computers should be interpreted to include multiple processors, or single processors executing multiple simultaneous tasks or sequential tasks, reflecting the understanding of those of ordinary skill in the art that one can arbitrarily divide or combine a computing task among one or more hardware platforms.
This application claims the benefit of U.S. Provisional Application 60/867,728 filed Nov. 29, 2006 and hereby incorporated by reference.
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/US07/85752 | 11/28/2007 | WO | 00 | 5/21/2009 |
| Number | Date | Country | |
|---|---|---|---|
| 60867728 | Nov 2006 | US |