The present invention relates generally to verifying software functional or procedural integrity, and determining whether a software process is not functioning normally.
The software industry relies on virtually billions of lines of software to work as expected. Companies write code, sell code, and rely on processes embodied in the code that is distributed. Unfortunately, once the code leaves the control of the developing company, the code can often be changed or used outside of the context that it was intended.
For example, some software vendors distribute products for playing video or audio files on a user's computer. Some of these packages have internal protection mechanisms that allow video or audio to be protected when used in conjunction with this product. Unfortunately, if a user inserts their own code into the product, or runs another piece of software that captures the computer screen as a movie, they can obtain a copy of the content being played. This is not something that most vendors would like to see occur. Unfortunately, this action currently is outside the context of vendor's product and cannot be easily detected using available technology.
Thus, it is with respect to these considerations and others that the present invention has been made.
Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following drawings. In the drawings, like reference numerals refer to like parts throughout the various figures unless otherwise specified.
For a better understanding of the present invention, reference will be made to the following Detailed Description of the Preferred Embodiment, which is to be read in association with the accompanying drawings, wherein:
In the following detailed description of exemplary embodiments of the invention, reference is made to the accompanied drawings, which form a part hereof, and which is shown by way of illustration, specific exemplary embodiments of which the invention may be practiced. Each embodiment is described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense.
Throughout the specification, the term “connected” means a direct connection between the things that are connected, without any intermediary devices or components. The term “coupled” means a direct connection either between the things that are connected, or an indirect connection through one or more passive or active intermediary devices or components. The meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”
The present invention is directed at providing a method and system for differentiating between normal operational characteristics and abnormal (also called non-normal) operational characteristics such that software product tampering may be identified programmatically. Additionally, the invention provides for a variance for identification of behavior that is defined to be within the realm of normal user behavior.
It has been determined that abnormal behavior may be considered that behavior that is not normal behavior. Detection of normal behavior provides for the ability to detect abnormal behavior. Data is gathered and when a sufficient amount of abnormal behavior has been detected, a signal may be provided such that any of a variety of actions may be performed. Such actions may include, but is not limited to, providing an alert message to a software provider, providing a warning message, shutting down a computing device, software process, and the like.
The invention obtains samples of predetermined traits needed to monitor the software for evidence of tampering. In most cases, this equates to a select number of system level calls that access resources that may be considered important, such as reading and writing to hard drives, memory, network resources, and the like.
Each predetermined trait is assigned a unique number. When a piece of software is running, it produces a stream of data identifying when predetermined traits that need to be monitored are utilized. Each predetermined trait is summarized from the data and statistical information about a trend associated with each trait may be produced.
The trends of the predetermined traits are compared to identified good trends to determine if they are normal. If there is not enough data to determine the trend of the traits exhibited, the result will be that the behavior is unknown. When there is enough data to make a determination, then the result may be normal or abnormal.
Operating Environment
The client 103 may be a computing device, such as portable computer, desktop computer, personal digital assistant (PDAs), a media player, or other similar device that a software application may be exposed to tampering. One embodiment of client 103 is described in more detail below, in conjunction with
The network 102 can employ any form of computer readable media for communicating information from one electronic device to another. Also, network 102 can include the Internet in addition to local area networks (LANs), wide area networks (WANs), direct connections, such as through a universal serial bus (USB) port, other forms of computer-readable media, or any combination thereof. On an interconnected set of LANs, including those based on differing architectures and protocols, a router acts as a link between LANs, enabling messages to be sent from one to another. Also, communication links within LANs typically include twisted wire pair or coaxial cable, while communication links between networks may utilize analog telephone lines, full or fractional dedicated digital lines including T1, T2, T3, and T4, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links, or other communications links known to those skilled in the art. Furthermore, remote computers and other related electronic devices can be remotely connected to either LANs or WANs via a modem and temporary telephone link. A remote computer may act in a number of ways, including as a WWW (content) server or a client with a browser application program.
In
Therefore, the operating environment shown in
Computer 500 includes processing unit 512, video display adapter 514, and a mass memory, all in communication with each other via bus 522. The mass memory generally includes RAM 516, ROM 532, and one or more permanent mass storage devices, such as hard disk drive 528, tape drive, optical drive, and/or floppy disk drive. The mass memory stores operating system 520 for controlling the operation of computer 500. Any general-purpose operating system may be employed. Basic input/output system (“BIOS”) 518 is also provided for controlling the low-level operation of computer 500.
As illustrated in
The mass memory as described above illustrates another type of computer-readable media, namely computer storage media. Computer storage media may include volatile, nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computing device.
In one embodiment, the mass memory stores program code and data for implementing operating system 520. The mass memory may also store additional program code and data for performing the functions of computer 500, and tamper detector 502. One or more applications, such as application 104 when computer 500 is employed to illustrate client 103 of
While computer 500 illustrates tamper detector 502 as a component, the present invention is not so limited. For example, tamper detector 502 may operate within a server, such as server 101 in
Computer 500 may also include an SMTP handler application for transmitting and receiving e-mail, an HTTP handler application for receiving and handing HTTP requests, and an HTTPS handler application for handling secure connections. The HTTPS handler application may initiate communication with an external application in a secure fashion.
Computer 500 also includes input/output interface 524 for communicating with external devices, such as a mouse, keyboard, scanner, or other input devices not shown in
Generalized Operation
Described below is one embodiment of a process flow for employing graphical representations, such as chaos game representations, and the like, for evaluating evidence of tampering of software.
The process begins by determining the main components related to the application processes of interest.
Virtually every application process may be considered as a set of system calls, which include organized system sequences, and which may be ranked by a limited length. When comparing two different data sets, where one represents the pattern and the other data set represents a prototype, the process employs two major components of that data, Consequence and Frequency. Comparing the two different data sets includes a comparison of consequences and frequencies of certain system calls. However, the present invention is not limited to consequence and frequency, and other components of the data may be employed without departing from the scope of the invention.
The terms “pattern,” “pattern data,” and the like, refer to a set of data that represents a behavior that is determined to be normal for a given software process, application, and the like. The terms, “prototype,” “prototype data,” and the like refer to another set of data that represents a behavior of the given software process, application, and the like. Prototype data may be compared to the pattern data to determine whether the software process is considered to have been tampered with.
Creating a Fingerprint as a Consequence Pattern
The process begins by calculating consecutive data for a sequence of file system calls for what is considered to be an unhacked (normal) application process. Various approaches related to graphing representations, and the like, may be employed for analyzing the data. In one embodiment Chaos Game Representation (CGR) is employed to convert the calculated data to CGR dot-data. The obtained CGR dot-data is then converted to the radius-vector data as a fingerprint.
Vector analysis rules are employed for processing the data. Newly created radius vectors are substituted with just one as a pattern from the fingerprint. The pattern is then trained for better matching the real situation.
When the created vector becomes saturated or matured, the X and Y coordinates of the ending point of the created vector (the origin point of the created vector already has (0, 0) coordinates assigned by default) are retained. A general condition that may indicate that the vector is saturated and mature includes the event when a difference between two measurements of sequence errors, ER1 and ER2, for substantially the same pattern or prototype is equal to or less then a preliminary assigned value Val:
|ER1−ER2|<=Val
Next, the process verifies the vector stability through the next 50-100 new system calls. When the vector stability is verified, the number of system calls is assigned to one vector, and the process starts again at creating new dot-data until the new vector is generated. Once the stable vector has been processed, the statistics are reinitialized in preparation for the next vector. By continuing the process, a sequence of vectors is created, which represent all changes in the sequence of the data that have occurred.
Creating a Fingerprint as a Frequency Pattern
A fingerprint for the frequency pattern data may be obtained by calculating a sum of substantially every type of system call that appears in the data processing as well as the total number of system calls that are made.
A vector is created by adding up all of the average directional vectors.
The maximal range of such determined values of system calls as Write, Read, Seek, or the like is obtained. These are retained as pattern values.
The process continues to create the new data until the next saturated and mature vector representing the consequence pattern appears. Continuing the process, the sequence of patterns is created, which represent substantially all changes of frequency in the data that happened virtually in real time to each kind of the system calls.
Creating Fingerprints in Real Time as the Prototypes to Consequence and Frequency Patterns
The process continues by calculating the consequence data of the file system calls and frequency numbers for the tested application virtually in real time. The process steps described above for the creating of consequence and frequency patterns are continued.
Decision-Making
The process next creates a way of analyzing and comparing the real time data to the pattern data. The process runs the decision engine to obtain the results. Each pattern, created as described above, is created separately for the applications of any known kinds and any known computer platforms the user might use and for which the integrity is sought.
Consequence Pattern and Prototype Creation
Initially, the Chaos Game Representation (CGR) scatter plot is determined. Next, the process creates the X and Y coordinates axis through the center of the circle. An assumption may be made that virtually every point is considered as a radius vector representative with the origin (0, 0) point.
The set of the created vectors is considered as a fingerprinting field for the patterns. To create a pattern-vector Ap and keep it in the unit circle range the process determines the average sum of all vectors.
Aj(j=1,N) created on the CGR:
where N is a number of all points in the CGR.
In coordinates form, the Ap vector is:
Ap{Xp,Yp}
The new obtained vector Ap inherits the information from each of the N vectors. The previously created vectors (or points) are substituted with one newly created vector Aj with Xp and Yp coordinates. The new vector, Ap, accumulates substantially all sets of behaviors represented by the CGR. Thus, only one radius vector coordinates are retained for each created pattern rather than thousand of points. This improves the efficiency of memory usage and software performance. The process of obtaining the pattern-vector requires fewer computer operations such as additions.
Frequency Pattern and Prototype Creation
The process for creating frequency patterns or prototypes is described above. The process calculates and cumulates the sum for virtually each type of happened system calls for each sample of given length.
A CGR is created employing the following process. Typically, the number of points K that are equal to or less than the number of possible or chosen system calls is employed. Virtually every new calculated point for virtually each kind of system call may be located on the line between center and a point on the circle surface that represent this kind of system call. Virtually every point represents the vector vertices at the given direction (same as an angle).
The length L1 of the newly created vector represents the frequency of its appearance and can be determined from the formula:
where TNC is a total number of system calls, Pi is a single point, and M is a total number of calculated points for this type of system call.
Furthermore,
In creating a pattern vector Pp (employing substantially the same approach for the prototype vector) an average sum of all vectors Pj (j=1, K) created on the CGR is determined:
where K is a number of all system calls (vectors) in the CGR. In the coordinate form, the Pp vector substantially is:
Pp{Xp,Yp}
The newly obtained vector Pp inherits the information from the K vectors.
Determining Normal from Non-Normal Behavior
When incoming data with real time points needs to be processed, the fingerprinting fields of the consequence vector-prototype and the frequency vector-prototype are produced. These are compared to the corresponding vectors-pattern. Analyzing the results, a final decision may be determined.
The process, which is virtually the same for all comparisons, is given below.
Two different vectors Ap and Bpr are compared by their norms and angle between the vectors. The formula for determining the angle between the two given vectors Ap and Bpr can be represented as the following:
Consideration is typically given to the upper (nominator) part of the given fraction because it has practical influence on the preliminary results of analysis and more weight for the further decisions. For example, the following three situations may arise, where the nominators could be as the following:
X1*X2+Y1*Y2=0 1.
X1*X2+Y1*Y2<0 2.
X1*X2+Y1*Y2>0 3.
Equation 1 above illustrates that an angle between vectors Ap and Bpr is 90 degrees and vectors are perpendicular.
Equation 2 above illustrates that vectors Ap and Bpr have an opposite direction.
Equations 1 and 2 thus illustrate that substantial abnormality, and as such are ready for a decision to be determined. Equation 3 above illustrates a final decision, and is explained below.
One approach for determining normal from non-normal behavior with respect to equation 3 above that enables reasonable and confident results is described next.
The approach begins by determining a norm N1 of vectors Ap and Bpr sum:
N1=|Ap+Bpr|=√{square root over ((X1+X2)2+(Y1+Y2)2)}{square root over ((X1+X2)2+(Y1+Y2)2)}.
Next, a norm N2 of vectors Ap and Bpr difference is determined as:
N2=|Ap−Bpr|=√{square root over ((X1−X2)2+(Y1−Y2)2)}{square root over ((X1−X2)2+(Y1−Y2)2)}
It is then determined how vectors Ap and Bpr are similar by using their directions:
E1=1−(N1−N2)/N1=N2/N1.
The norm Np and Npr of vectors Ap and Bpr are determined by:
Np=√{square root over (X12+Y12)}
Npr=√{square root over (X22+Y22)}.
Similarity of vectors Ap and Bpr may be determined by employing their lengths, using the following equation:
E2=|Np−Npr|/Np.
One way to determine a total difference, ER, between vectors Ap and Bpr may also be represented as:
ER=max(E1,E2).
The total difference may also be represented by a percentage measurement, as:
ER %=max(E1,E2)*100%.
Next, a Confidence level, CL, or fitting Probability, FP, may be determined by:
CL=FP=1−ER.
Similarly, a percentage of the confidence level may be determined as:
CL %=FP %=100%−ER %.
Now, if the value for the confidence level shows that the maximum difference between vectors Ap and Bpr, and their direction, can be trusted, then a determination may be made as whether the process's behavior is normal or non-normal. For determining a final result, a comparison between one of a priori set values, mentioned above, and one of obtained set values during the newest calculation such as ER, or ER % or CL, or FP, or CL %, or FP %, may be performed.
A degree of trust may be obtained for the result based in part on the maximal error. Thus, for example, if the maximal error calculated is inside or on the border of the preliminary assigned error interval, the result may be determined to indicate normal behavior of the software process.
When the results of all comparisons (say 2×3=6) are obtained, then the process creates the procedure for the final decision.
While the above disclosure employed Chaos Game Representations, those skilled in the art will recognize that the invention is not limited to such implementation and other graphical representation schemes may be employed without departing from the scope or spirit of the invention.
The above specification, examples, and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
This application claims the benefit of U.S. Provisional Application No. 60/412,265 filed on Sep. 20, 2002, which is hereby claimed under 35 U.S.C. §119(e).
Number | Name | Date | Kind |
---|---|---|---|
4535355 | Arn et al. | Aug 1985 | A |
4694489 | Frederiksen | Sep 1987 | A |
5067035 | Kudelski et al. | Nov 1991 | A |
5134656 | Kudelski | Jul 1992 | A |
5144663 | Kudelski et al. | Sep 1992 | A |
5375168 | Kudelski | Dec 1994 | A |
5539450 | Handelman | Jul 1996 | A |
5590200 | Nachman et al. | Dec 1996 | A |
5592212 | Handelman | Jan 1997 | A |
5621799 | Katta et al. | Apr 1997 | A |
5640546 | Gopinath et al. | Jun 1997 | A |
5666412 | Handelman et al. | Sep 1997 | A |
5684876 | Pinder et al. | Nov 1997 | A |
5758257 | Herz et al. | May 1998 | A |
5774527 | Handelman et al. | Jun 1998 | A |
5774546 | Handelman et al. | Jun 1998 | A |
5799089 | Kuhn et al. | Aug 1998 | A |
5805705 | Gray et al. | Sep 1998 | A |
5870474 | Wasilewski et al. | Feb 1999 | A |
5878134 | Handelman et al. | Mar 1999 | A |
5883957 | Moline et al. | Mar 1999 | A |
5892900 | Ginter et al. | Apr 1999 | A |
5910987 | Ginter et al. | Jun 1999 | A |
5915019 | Ginter et al. | Jun 1999 | A |
5917912 | Ginter et al. | Jun 1999 | A |
5920625 | Davies | Jul 1999 | A |
5920861 | Hall et al. | Jul 1999 | A |
5922208 | Demmers | Jul 1999 | A |
5923666 | Gledhill et al. | Jul 1999 | A |
5933498 | Schneck et al. | Aug 1999 | A |
5939975 | Tsuria et al. | Aug 1999 | A |
5943422 | Van Wie et al. | Aug 1999 | A |
5949876 | Ginter et al. | Sep 1999 | A |
5982891 | Ginter et al. | Nov 1999 | A |
5991399 | Graunke et al. | Nov 1999 | A |
6009116 | Bednarek et al. | Dec 1999 | A |
6009401 | Horstmann | Dec 1999 | A |
6009525 | Horstmann | Dec 1999 | A |
6021197 | von Willich et al. | Feb 2000 | A |
6035037 | Chaney | Mar 2000 | A |
6038433 | Vegt | Mar 2000 | A |
6049671 | Slivka et al. | Apr 2000 | A |
6055503 | Horstmann | Apr 2000 | A |
6073256 | Sesma | Jun 2000 | A |
6112181 | Shear et al. | Aug 2000 | A |
6138119 | Hall et al. | Oct 2000 | A |
6157721 | Shear et al. | Dec 2000 | A |
6178242 | Tsuria | Jan 2001 | B1 |
6185683 | Ginter et al. | Feb 2001 | B1 |
6189097 | Tycksen, Jr. et al. | Feb 2001 | B1 |
6191782 | Mori et al. | Feb 2001 | B1 |
6226794 | Anderson, Jr. et al. | May 2001 | B1 |
6237786 | Ginter et al. | May 2001 | B1 |
6240185 | Van Wie et al. | May 2001 | B1 |
6247950 | Hallam et al. | Jun 2001 | B1 |
6253193 | Ginter et al. | Jun 2001 | B1 |
6256668 | Slivka et al. | Jul 2001 | B1 |
6272636 | Neville et al. | Aug 2001 | B1 |
6285985 | Horstmann | Sep 2001 | B1 |
6292569 | Shear et al. | Sep 2001 | B1 |
6298441 | Handelman et al. | Oct 2001 | B1 |
6314409 | Schneck et al. | Nov 2001 | B2 |
6314572 | LaRocca et al. | Nov 2001 | B1 |
6334213 | Li | Dec 2001 | B1 |
6363488 | Ginter et al. | Mar 2002 | B1 |
6389402 | Ginter et al. | May 2002 | B1 |
6405369 | Tsuria | Jun 2002 | B1 |
6409080 | Kawagishi | Jun 2002 | B2 |
6409089 | Eskicioglu | Jun 2002 | B1 |
6415031 | Colligan et al. | Jul 2002 | B1 |
6427140 | Ginter et al. | Jul 2002 | B1 |
6449367 | Van Wie et al. | Sep 2002 | B2 |
6449719 | Baker | Sep 2002 | B1 |
6459427 | Mao et al. | Oct 2002 | B1 |
6466670 | Tsuria et al. | Oct 2002 | B1 |
6505299 | Zeng et al. | Jan 2003 | B1 |
6587561 | Sered et al. | Jul 2003 | B1 |
6618484 | Van Wie et al. | Sep 2003 | B1 |
6629243 | Kleinman et al. | Sep 2003 | B1 |
6634028 | Handelmann | Oct 2003 | B2 |
6640304 | Ginter et al. | Oct 2003 | B2 |
6651170 | Rix | Nov 2003 | B1 |
6654420 | Snook | Nov 2003 | B1 |
6654423 | Jeong et al. | Nov 2003 | B2 |
6658568 | Ginter et al. | Dec 2003 | B1 |
6668325 | Collberg et al. | Dec 2003 | B1 |
7240196 | Cooper et al. | Jul 2007 | B2 |
20020001385 | Kawada et al. | Jan 2002 | A1 |
20020015498 | Houlberg et al. | Feb 2002 | A1 |
20020021805 | Schumann et al. | Feb 2002 | A1 |
20020089410 | Janiak et al. | Jul 2002 | A1 |
20020104004 | Couillard | Aug 2002 | A1 |
20020141582 | Kocher et al. | Oct 2002 | A1 |
20030007568 | Hamery et al. | Jan 2003 | A1 |
Number | Date | Country |
---|---|---|
658054 | Jun 1995 | EP |
714204 | May 1996 | EP |
0886409 | Dec 1998 | EP |
WO9606504 | Feb 1996 | WO |
WO9632702 | Oct 1996 | WO |
WO 9930499 | Jun 1999 | WO |
WO9954453 | Oct 1999 | WO |
WO 0135571 | May 2001 | WO |
WO-0193212 | Dec 2001 | WO |
WO 0221761 | Mar 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040153873 A1 | Aug 2004 | US |
Number | Date | Country | |
---|---|---|---|
60412265 | Sep 2002 | US |