Claims
- 1. A method for detecting an anomalous operation of a computer system that executes a plurality of program modules, the method comprising:
(a) monitoring transitions between and among defined points within an internal operating environment on the computer system and producing program execution trace data; (b) comparing the program execution trace data with data indicative of a nominal operation of the computer system; and (c) identifying an anomalous operation of the computer system based on the result of the comparison.
- 2. A method as recited in claim 1, wherein the data indicative of a nominal operation of the computer system comprises a plurality of values (ra,b) and wherein the comparing act comprises:
- 3. A method as recited in claim 2, wherein each program module is associated with one of k different classes, and wherein said difference is considered only for one or more pairs of program modules (a,b) that satisfy the condition that a and b are members of the same class.
- 4. A method as recited in claim 3, wherein the comparing act further comprises:
determining the total difference: 19dt=∑i=1k∑ a, b∈oi |ra, b-ra, b′|, wherein oi is the set of program modules belonging to the ith one of said k different classes.
- 5. A method as recited in claim 4, wherein the total difference is computed based on corresponding triangles of the matrices represented by ra,b and r′a,b.
- 6. A method as recited in claim 5, wherein the total difference is determined by computing:
- 7. A method as recited in claim 2, wherein ra,b is computed according to the following formulas:
- 8. A method as recited in claim 2, wherein each era comprises the period during which invocation of any of the program modules has occurred a predetermined number (K) times, such that a new era begins each time that K invocations have taken place since the current era began.
- 9. A method as recited in claim 1, wherein each program module implements a predefined functional requirement.
- 10. A method as recited in claim 9, wherein each program module includes a mechanism for calling another module, and the method further comprises the use of a statistical methodology to identify a relatively small set of cohesive program modules that represent the dynamic bindings among program modules as they execute.
- 11. A method as recited in claim 10, wherein defined points are employed to monitor the activity of an executing program and to indicate an epoch in the execution of the program.
- 12. A method as recited in claim 11, further comprising recording, in an execution profile for the program, telemetry from the defined points at each epoch.
- 13. A method as recited in claim 12, wherein the execution profile comprises an n element vector (X) comprising at least one entry for each program module.
- 14. A method as recited in claim 13, wherein each element, xi, of said vector contains a frequency count for the number of times that the corresponding defined point mi has executed during an era of k epochs, where
- 15. A method as recited in claim 14, wherein an execution profile is recorded whenever the number of epochs, k, reaches a predefined count, K, at which time the contents of the execution profile vector is set to zero.
- 16. A method as recited in claim 15, wherein the recorded activity of the program during its last L=jK epochs is stored in a sequence of j execution profiles, X1, X2 . . . , Xj, where the value xi,j represents the frequency of execution of the ith program module on the jth execution profile.
- 17. A method as recited in claim 14, further comprising the step of reducing the size of the execution profiles from n, the number of defined points whose activity is highly correlated, to a smaller set of m virtual defined points whose activity is substantially uncorrelated.
- 18. A method as recited in claim 17, wherein the statistical technique of principal components analysis is employed to reduce the dimensionality of the execution profiles.
- 19. A method as recited in claim 17, wherein the statistical technique of principal factor analysis is employed to reduce the dimensionality of the execution profiles.
- 20. A method as recited in claim 17, wherein an n×j, j>n data matrix D=X1, X2, . . . , Xj is factored into m virtual orthogonal module components, where m is less than n, whereby the dimensionality is reduced from n to m.
- 21. A method as recited in claim 20, wherein an eigenvalue λi is associated with each of the m orthogonal components.
- 22. A method as recited in claim 21, wherein the eigenvalues satisfy the relation
- 23. A method as recited in claim 20, further comprising using a predefined stopping rule in determining a number of components extracted in an orthogonal structure representing an execution profile with reduced dimensionality.
- 24. A method as recited in claim 23, wherein the stopping rule is: extract all components whose eigenvalues are greater that a predefined threshold.
- 25. A method as recited in claim 23, wherein the stopping rule is: extract those components such that the proportion of variation represented by
- 26. A method as recited in claim 20, further comprising constructing a matrix (P), wherein said matrix is an n×m structure whose rows, p•j, contain values showing the degree of relationship of the variation of the ith program module and the jth factor or principal component.
- 27. A method as recited in claim 20, further comprising the step of forming a mapping vector (O) for at least one execution profile vector.
- 28. A method as recited in claim 27, wherein the mapping vector, O, comprises elements oj whose values are defined as follows:
let 26qi=max1≤j≤mpij;let oj=index (qj) represent the column number in which the corresponding value qj occurs.
- 29. A method as recited in claim 28, wherein the mapping vector contains data to map probe event frequencies recorded in the execution profile vector onto corresponding virtual module equivalents.
- 30. A method as recited in claim 29, wherein a frequency count for each defined point k in an execution profile vector is represented by a value fk, and the mapping vector element ok contains an the index value that k maps into.
- 31. A method as recited in claim 20, wherein m orthogonal sources of variation in the data vector D representing the original n defined points is identified.
- 32. A method as recited in claim 30, wherein, on each of the original raw execution profiles, the defined point frequency count is represented in the elements, xi,j of the profile vector, Xi.
- 33. A method as recited in claim 27, wherein a frequency count for each defined point k in an execution profile vector is represented by a value fk; wherein the mapping vector element ok contains an the index value that k maps into; wherein the mapping vector contains data to map probe event frequencies recorded in the execution profile vector onto corresponding virtual module equivalents; and wherein, after the mapping vector has been established, a virtual profile vector (Yi) is employed to contain the frequency counts for interactions among virtual execution domain sets.
- 34. A method as recited in claim 33, wherein the virtual profile vector, Yi, is defined by:
- 35. A method for detecting an anomalous operation of a computer system that comprises a plurality of program modules, the method comprising:
(a) monitoring transitions between and among instrumentation points within an operating environment on the computer system, wherein said monitoring is performed by employing signals obtained from instrumented code in the program modules; (b) providing program instrumentation trace data representative of the transitions between and among program modules within a time frame; (c) identifying a relatively small set of virtual execution domains whose activity is substantially uncorrelated, and using this information to reduce the amount of trace data needed to detect anomalous activity; (d) comparing the reduced amount of trace data with predefined data indicative of a nominal operation of the computer system; and (e) identifying an anomalous operation of the computer system based on the result of the comparison.
- 36. A method as recited in claim 35, wherein the data indicative of a nominal operation of the computer system comprises a plurality of values (ra,b) and wherein the comparing act comprises:
- 37. A method as recited in claim 36, wherein each program module is associated with one of the virtual execution domains, and wherein said difference is considered only for one or more pairs of program modules (a,b) that satisfy the condition that a and b are members of the same class.
- 38. A method as recited in claim 37, wherein the comparing act further comprises:
determining the total difference: 28dt=∑i=1k∑a,b∈o i |ra,b-(r′)a,b|, wherein there are k different virtual execution domains, and wherein oi is the set of program modules belonging to the ith one of said k different virtual execution domains.
- 39. A method as recited in claim 38, wherein the total difference is computed based on corresponding triangles of the matrices represented by ra,b and r′a,b.
- 40. A method as recited in claim 39, wherein the total difference is determined by computing:
- 41. A method as recited in claim 36, wherein ra,b is computed according to the following formulas:
- 42. A method as recited in claim 36, wherein each era comprises the period during which invocation of any of the program modules has occurred a predetermined number (K) times, such that a new era begins each time that K invocations have taken place since the current era began.
- 43. A method as recited in claim 35, wherein said program execution trace data is employed to provide an execution profile including a list of execution paths that have executed in a specified time frame and the frequencies of executions.
- 44. A computer system, comprising:
(a) a plurality of program modules; (b) monitoring means for monitoring transitions between and among defined points within the program modules, wherein said monitoring is performed by employing signals obtained from instrumented code in the program modules, and for providing trace data representative of the transitions between or among program modules within a time frame; (c) means for identifying a relatively small set of virtual execution domains whose activity is substantially uncorrelated, and using this information to reduce the amount of trace data needed to detect anomalous activity; (d) means for comparing the reduced amount of trace data with predefined data indicative of a nominal operation of the computer system; and (e) means for identifying an anomalous operation of the computer system based on the result of the comparison.
- 45. A computer system as recited in claim 44, wherein the data indicative of a nominal operation of the computer system comprises a plurality of values (ra,b) and wherein the means for comparing determines the difference
- 46. A computer system as recited in claim 45, wherein each program module is associated with one of the virtual execution domains, and wherein said difference is considered only for one or more pairs of program modules (a,b) that satisfy the condition that a and b are members of the same class.
- 47. A computer system as recited in claim 46, wherein the means for comparing determines the total difference:
- 48. A computer system as recited in claim 47, wherein the total difference is computed based on corresponding triangles of the matrices represented by ra,b and r′a,b.
- 49. A computer system as recited in claim 48, wherein the total difference is determined by computing:
- 50. A method as recited in claim 45, wherein ra,b is computed according to the following formulas:
- 51. A computer system as recited in claim 45, wherein each era comprises the period during which invocation of any of the program modules has occurred a predetermined number (K) times, such that a new era begins each time that K invocations have taken place since the current era began.
- 52. A system as recited in claim 44, wherein said trace data is employed to provide an execution profile including a list of execution paths that have executed in a specified time frame and the frequencies of executions.
- 53. A system as recited in claim 44, further comprising recording, in a first execution profile for the program, telemetry from the defined points at each epoch.
- 54. A system as recited in claim 53, wherein the first execution profile comprises an n element vector (X) comprising at least one entry for each program module, and wherein each element, xi, of said vector contains a frequency count for the number of times that the corresponding defined point mi has executed during an era of k epochs, where
- 55. A system as recited in claim 54, wherein the recorded activity of the program during its last L=jK epochs is stored in a sequence of j execution profiles, X1, X2, . . . , Xj, where the value xi,j represents the frequency of execution of the ith program module on the jth execution profile.
- 56. A system as recited in claim 55, further comprising the step of reducing the dimensionality of the execution profiles from n, the number of defined points whose activity is highly correlated, to a smaller set of m virtual points whose activity is uncorrelated.
- 57. A system as recited in claim 56, wherein an n×j, j>n data matrix D=X1, X2, . . . , Xj is factored into m virtual orthogonal module components, where m is less than n, whereby the dimensionality is reduced from n to m.
- 58. A method for evaluating the behavior of a computer program, the computer program comprising a plurality of program modules, the method comprising:
associating each of the program modules with one of a plurality of virtual modules; generating first data indicative of the normal behavior of the computer program, the first data comprising, for each pair of program modules (a,b), a value indicative of the correlation between the occurrence of invoking program module a and the occurrence of invoking program module b; based on an execution of the computer program, generating second data indicative of the behavior of the computer program during said execution, said second data comprising, for each pair of program modules (a,b), a value indicative of the correlation between the occurrence of invoking program module a and the occurrence of invoking program module b; comparing said first data with said second data by comparing the value in the first data associated with a pair of program modules (m1,m2) with the value in the second data associated with the same pair of program modules (m1,m2), wherein m1 and m2 are associated with the same virtual module; and determining whether the computer program exhibited normal behavior during said execution based on the result of said comparing act.
- 59. A method as recited in claim 58, wherein an execution profile of the computer program has been taken a plurality of times, and wherein a matrix X represents the results of the execution profiles, wherein each of the values of the matrix (xi,j) represents the number of times that the ith program module was invoked during the jth execution profile, and wherein the matrix is factored into a number m of virtual orthogonal components based on an eigenvalue (λi).
- 60. A method as recited in claim 59, wherein each of the virtual orthogonal components satisfies the condition:
- 61. A method as recited in claim 59, wherein each of the virtual orthogonal components satisfies the condition:
- 62. A method as recited in claim 58, wherein said associating act comprises:
determining a degree of correlation between each of the program modules and each of the virtual modules; associating each of the program modules with the virtual module with which the program module's degree of correlation is the highest.
- 63. A method as recited in claim 62, wherein a matrix P is an n×m matrix whose elements are pi,j, each element representing the degree of correlation between the ith program module and the jth virtual module, and wherein the virtual module with which a program module's correlation is the highest is determined by computing:
- 64. A method as recited in claim 58, wherein said first data comprise a matrix R, whose values are ra,b, each value ra,b representing a degree of correlation between the occurrence of invoking program module a and the occurrence of invoking program module b.
- 65. A method as recited in claim 64, further comprising:
partitioning the matrix R into m matricies R1, . . . , Rm, such that each partitioned matrix Ri comprises those elements ra,b of R that satisfy the condition that program module a and program module b are both associated with the ith virtual module.
- 66. A method as recited in claim 64, wherein each of the values ra,b is computed according to the following formulas:
- 67. A method as recited in claim 64, wherein the first data indicative of normal behavior of the computer program is generated by profiling the execution of the computer program a predetermined number of times (j), and wherein said second data comprise a matrix R′, each value r′a,b representing a degree of correlation between the occurrence of invoking program module a and the occurrence of invoking program module b based on the original j times that the execution of the computer program was profiled plus one additional execution profile of the computer program.
- 68. A method as recited in claim 67, wherein said comparing act comprises computing the value:
- 69. A method as recited in claim 67, wherein said comparing act comprises evaluating the differences between values in corresponding triangles of the matrices R and R′.
- 70. A method as recited in claim 69, wherein said comparing act comprises computing the value:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a continuation-in-part of U.S. patent application Ser. No. 10/099,752, filed Mar. 15, 2002, “Method and System For Simplifying the Structure of Dynamic Execution Profiles,” which is a continuation-in-part of U.S. patent application Ser. No. 09/309,755, filed May 11, 1999, “Dynamic Software System Intrusion Detection,” both of which are hereby incorporated by reference in their entirety.
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
10099752 |
Mar 2002 |
US |
Child |
10462462 |
Jun 2003 |
US |
Parent |
09309755 |
May 1999 |
US |
Child |
10099752 |
Mar 2002 |
US |