Claims
- 1. A method for classifying an executable attachment in an email received at an email processing application of a computer system comprising:
a) at the email processing application, filtering said executable attachment from said email; b) extracting a byte sequence feature from said executable attachment; and c) classifying said executable attachment by comparing said byte sequence feature of said executable attachment with a classification rule set derived from byte sequence features of a set of executables having a predetermined class in a set of classes.
- 2. The method as defined in claim 1, wherein the step of extracting said byte sequence feature from said executable attachment comprises extracting static properties of said executable attachment.
- 3. The method as defined in claim 1, wherein the step of extracting said byte sequence feature from said executable attachment comprises converting said executable attachment from binary format to hexadecimal format.
- 4. The method as defined in claim 1, wherein the step of extracting said byte sequence features from said executable attachment comprises creating a byte string representative of resources referenced by said executable attachment.
- 5. The method as defined in claim 1, wherein the step of classifying said executable attachment comprises determining a probability that said executable attachment is a member of each class in a set of classes consisting of malicious and benign.
- 6. The method as defined in claim 1, wherein the step of classifying said executable attachment comprises determining a probability that said executable attachment is a member of each class in a set of classes consisting of malicious, benign, and borderline.
- 7. The method as defined in claim 1, wherein the step of classifying said executable attachment comprises determining a probability that said executable attachment is a member of each class in said set of classes based on said byte sequence feature.
- 8. The method as defined in claim 7, wherein the step of classifying said executable attachment comprises determining said probability that said executable attachment is a member of each class in said set of classes with a Naive Bayes algorithm.
- 9. The method as defined in claim 7, wherein the step of classifying the executable attachment comprises determining said probability that said executable attachment is a member of a class in said set of classes with a Multi-Naive Bayes algorithm.
- 10. The method as defined in claim 9, which further comprises dividing said step of determining said probability into a plurality of processing steps and executing said processing steps in parallel.
- 11. The method as defined in claim 7, wherein the step of classifying the executable attachment comprises classifying said executable attachment as malicious if said probability that said executable attachment is malicious is greater than said probability that said executable attachment is benign.
- 12. The method as defined in claim 7, wherein the step of classifying the executable attachment comprises classifying said executable attachment as benign if said probability that said executable attachment is benign is greater than said probability that said executable attachment is malicious.
- 13. The method as defined in claim 7, wherein the step of classifying the executable attachment comprises classifying said executable attachment as borderline if a difference between said probability that said executable attachment is benign and said probability that said executable attachment is malicious is within a predetermined threshold.
- 14. The method as defined in claim 1, which further comprises logging said class of said executable attachment classified in said step c).
- 15. The method as defined in claim 14, wherein said step of logging said class of said executable attachment further comprising incrementing a count of said executable attachments classified as borderline.
- 16. The method defined in claim 15, which further comprises, if said count of executable attachments exceeds a predetermined threshold, providing a notification that said threshold has been exceeded.
- 17. A method for classifying an executable program comprising:
a) training a classification rule set based on a predetermined set of known executable programs having a predetermined class and one or more byte sequence features by recording the number of known executable programs in each said predetermined class that has each of said byte sequence features; b) extracting a byte sequence feature from said executable program comprising converting said executable program from binary format to hexadecimal format; c) determining the probability that the executable program is within each said predetermined class, based on said one or more byte sequence features in said executable and said classification rule set.
- 18. The method as defined in claim 17, wherein the step of extracting said byte sequence feature from said executable program comprises extracting static properties of said executable program.
- 19. The method as defined in claim 17, wherein the step of determining the probability that the executable program is within each said predetermined class comprises determining the probability that the executable program is within said predetermined class in a set of classes consisting of malicious and benign.
- 20. The method as defined in claim 17, wherein the step of determining the probability that the executable program is within each said predetermined class comprises step determining the probability that the executable program is within said predetermined class in a set of classes consisting of malicious, benign, and borderline.
- 21. The method as defined in claim 17, wherein the step of determining said probability that the executable program is within each said predetermined class comprises determining said probability that the executable program is within each said predetermined class with a Naive Bayes algorithm.
- 22. The method as defined in claim 17, wherein the step of determining said probability that the executable program is within each said predetermined class comprises determining said probability that the executable program is within each said predetermined class with a multi-Naive Bayes algorithm.
- 23. The method as defined in claim 17, wherein the step of determining said probability that the executable program is within each said predetermined class comprises classifying said executable program as malicious if said probability that said executable program is malicious is greater than said probability that said executable program is benign.
- 24. The method as defined in claim 17, wherein the step of determining said probability that the executable program is within each said predetermined class comprises classifying said executable program as benign if said probability that said executable program is benign is greater than said probability that said executable program is malicious.
- 25. The method as defined in claim 17, wherein the step of determining said probability that the executable program is within each said predetermined class comprises classifying said executable program as borderline if a difference between said probability that said executable program is benign and said probability that said executable program is malicious is within a predetermined threshold.
- 26. The method as defined in claim 17, which further comprises logging said class of said executable determined in said step c).
- 27. The method as defined in claim 26, wherein said step of logging said class of said executable further comprising incrementing a count of said executable classified as borderline.
- 28. The method defined in claim 27, which further comprises, if said count of executable exceeds a predetermined threshold, providing a notification that said threshold has been exceeded.
- 29. A system for classifying an executable attachment in an email received at a server of a computer system comprising:
a) an email filter configured to filter said executable attachment from said email; b) a feature extractor configured to extract a byte sequence feature from said executable attachment; and c) a rule evaluator configured to classify said executable attachment by comparing said byte sequence feature of said executable attachment to a classification rule set derived from byte sequence features of a set of executables having a predetermined class in a set of classes.
- 30. The system as defined in claim 29, wherein the feature extractor is configured to extract static properties of said executable attachment.
- 31. The system as defined in claim 29, wherein the feature extractor is configured to convert said executable attachment from binary format to hexadecimal format.
- 32. The system as defined in claim 29, wherein the feature extractor is configured to create a byte string representative of resources referenced by said executable attachment.
- 33. The system as defined in claim 29, wherein the rule evaluator is configured to predict the classification of said executable attachment as one class of a set of classes consisting of malicious and benign.
- 34. The system as defined in claim 29, wherein the rule evaluator is configured to predict the classification of said executable attachment as one class of a set of classes consisting of malicious, benign, and borderline.
- 35. The system as defined in claim 29, wherein the rule evaluator is configured to determine a probability that said executable attachment is a member of a class of said set of classes based on said byte sequence feature.
- 36. The system as defined in claim 35, wherein the rule evaluator is configured to determine said probability that said executable attachment is a member of one class of said set of classes with a Naive Bayes algorithm.
- 37. The system as defined in claim 35, wherein the rule evaluator is configured to determine said probability that said executable attachment is a member of a class of said set of classes with a multi-Naive Bayes algorithm.
- 38. The system as defined in claim 35, wherein the rule evaluator is configured to divide a determination said probability into a plurality of processing steps and to execute said processing steps in parallel.
- 39. The system as defined in claim 35, wherein the rule evaluator is configured to classify said executable attachment as malicious if said probability that said executable attachment is malicious is greater than said probability that said executable attachment is benign.
- 40. The system as defined in claim 35, wherein the rule evaluator is configured to classify said executable attachment as benign if said probability that said executable attachment is benign is greater than said probability that said executable attachment is malicious.
- 41. The system as defined in claim 35, wherein the rule evaluator is configured to classify said executable attachment as borderline if a difference between said probability that said executable attachment is benign and said probability that said executable attachment is malicious is within a predetermined threshold.
- 42. The system as defined in claim 29, which further comprises an email interface configured to log said class of said executable attachment classified in said step c).
- 43. The system as defined in claim 42, wherein said email interface is configured to increment a count of said executable attachments classified as borderline.
- 44. The system defined in claim 43, wherein said email interface is configured to, if said count of executable attachments exceeds a predetermined threshold, provide a notification that said threshold has been exceeded.
CLAIM FOR PRIORITY TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application serial Nos. 60/308,622, filed on Jul. 30, 2001, entitled “Data Mining Methods for Detection of New Malicious Executables” and No. 60/308,628, filed on Jul. 30, 2001, entitled “Malicious Email Filter,” which are hereby incorporated by reference in their entirety herein.
STATEMENT OF GOVERNMENT RIGHT
[0002] The present invention was made in part with support from the United States Defense Advanced Research Projects Agency (DARPA) grant nos. FAS-526617 and SRTSC-CU019-7950-1. Accordingly, the United States Government may have certain rights to this invention.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60308622 |
Jul 2001 |
US |
|
60308623 |
Jul 2001 |
US |