Claims
- 1. A method for constructing a computer virus detector, comprising:
- (a) providing a set of programs G known to be non-viral;
- (b) providing a set of programs V known to be viral;
- (c) forming a list of all features which appear in V greater than or equal to a specified number of times;
- (d) pruning the list of features to include only those which appear less than or equal to a specified threshold number of times in G;
- (e) constructing classifier input and output vectors based on the occurrence frequency of each feature in each member of the sets V and G;
- (f) training a classifier with the input and output vectors to discriminate between viral and non-viral programs.
- 2. The method of claim 1, wherein the features are byte sequences ranging from n.sub.min to n.sub.max bytes in length.
- 3. The method of claim 2, wherein n.sub.min =n.sub.max =3.
- 4. The method of claim 1, further comprising:
- pruning the list further to include a small feature set such that each program in the set V.sub.train contains at least n.sub.cover of the features.
- 5. The method of claim 1, further comprising:
- designating the members of set G as either training programs, G.sub.train, or as testing program G.sub.test.
- 6. The method of claim 5, wherein the programs in sets G and V are boot sector programs.
- 7. The method of claim 5, further comprising pruning the list further to include a small feature set such that each program in the set V.sub.train contains at least n.sub.cover of the features.
- 8. The method of claim 5, wherein the steps of designating the members of the sets V and G as testing and training sets is performed using randomized selection criteria.
- 9. The method of claim 1, wherein the programs in sets G and V are boot sector programs.
- 10. The method of claim 9, further comprising:
- augmenting the set G with byte sequences from a set C of executable programs known to be non-viral.
- 11. A system for detecting viruses in a computer program, comprising:
- a processor;
- a neural network operating in the processor, the neural network being trained as follows:
- providing a set of programs G known to be non-viral;
- providing a set of programs V known to be viral;
- forming a list of all features which appear in V;
- pruning the list of features to include only those which appear less than or equal to a specified threshold number of times in G and which appear greater than or equal to a specified threshold number of times in V;
- pruning the list further to include a small feature set such that each program in the set V contains at least n.sub.cover of the features;
- adjusting weights of the neural network with input and output vectors for each member of the sets V and G, such that the neural network discriminates between viral and non-viral programs;
- an input device for inputting a previously unknown sequence of data to the neural network; and
- an output device for outputting a determination of whether the unknown sequence is classified as viral or not viral.
CROSS-REFERENCE TO A RELATED PATENT APPLICATION
This patent application is a divisional patent application of U.S. patent application Ser. No. 08/242,757, filed May 13, 1994, now U.S. Pat. No. 5,675,711.
US Referenced Citations (4)
Foreign Referenced Citations (1)
Number |
Date |
Country |
WO 9322723 |
Nov 1993 |
WOX |
Divisions (1)
|
Number |
Date |
Country |
Parent |
242757 |
May 1994 |
|