Claims
- 1. A method for automatic interpretation of system data utilized to create a policy document, said method comprising:
filtering through each unknown event in a database of unknown events; selectively determining which unknown events among all of the unknown events in a database should be considered for inclusion in said policy document; and updating the policy document with event examples associated with the unknown events.
- 2. The method of claim 1, further comprising:
analyzing said unknown events within said database; and identifying trends within said database of unknown events based on a historical treatment of said unknown events;
- 3. The method of claim 1, further comprising:
outputting suggested elements from said database for inclusion in said policy documents.
- 4. The method of claim 1, wherein said database is a text file configured with a single event per line of text.
- 5. The method of claim 4, further comprising:
converting said text to a code form that is readable by a clustering algorithm; and passing said code form to said clustering algorithm.
- 6. The method of claim 5, further comprising:
grouping similar elements of said database into subgroups; and generating a feature vector for said subgroups with an associated set of valid values.
- 7. The method of claim 5, wherein said converting step converts events in the database into examples for said clustering algorithm.
- 8. The method of claim 7, wherein said converting step is completed by a parser function and includes:
dividing the database into tokens, wherein each line within said database is split into words separated by a blank space; collecting a single occurrence of each token, wherein duplication of tokens within a collected dictionary of tokens are substantially avoided; converting each line of said database into an example vector; and collecting output examples into event examples.
- 9. The method of claim 8, further comprising:
determining the line with a greatest number of tokens; assigning the number of tokens of said line as the length of each token vector to be utilized to convert events of the database into examples for said clustering algorithm, wherein said example vector each has a length equal to said length;
- 10. The method of claim 8, further comprising:
combining similar event examples into clusters; outputting said clusters to a user; and enabling user manipulation of parameters of said clustering algorithm to produce different clusters.
- 11. The method of claim 9, wherein IP addresses are provided with surrogate general expressions.
- 12. A computer system for automatic interpretation of system data utilized to create a policy document, said system comprising:
means for filtering through each unknown event in a database of unknown events; means for selectively determining which unknown events among all of the unknown events in a database should be considered for inclusion in said policy document; and means for updating the policy document with event examples associated with the unknown events.
- 13. The system of claim 12, further comprising:
means for analyzing said unknown events within said database; and means for identifying trends within said database of unknown events based on a historical treatment of said unknown events;
- 14. The system of claim 12, further comprising:
means for outputting suggested elements from said database for inclusion in said policy documents.
- 15. The system of claim 12, wherein said database is a text file configured with a single event per line of text.
- 16. The system of claim 15, further comprising:
means for converting said text to a code form that is readable by a clustering algorithm; and means for passing said code form to said clustering algorithm.
- 17. The system of claim 16, further comprising:
means for grouping similar elements of said database into subgroups; and means for generating a feature vector for said subgroups with an associated set of valid values.
- 18. The system of claim 16, wherein said means for converting converts events in the database into examples for said clustering algorithm.
- 19. The system of claim 18, wherein said means for converting comprises a parser function and includes:
means for dividing the database into tokens, wherein each line within said database is split into words separated by a blank space; means for collecting a single occurrence of each token, wherein duplication of tokens within a collected dictionary of tokens are substantially avoided; means for converting each line of said database into an example vector; and means for collecting output examples into event examples.
- 20. The system of claim 19, further comprising:
means for determining the line with a greatest number of tokens; means for assigning the number of tokens of said line as the length of each token vector to be utilized to convert events of the database into examples for said clustering algorithm, wherein said example vector each has a length equal to said length;
- 21. The system of claim 19, further comprising:
means for combining similar event examples into clusters; means for outputting said clusters to a user; and means for enabling user manipulation of parameters of said clustering algorithm to produce different clusters.
- 22. The system of claim 20, wherein IP addresses are provided with surrogate general expressions.
- 23. A method for data mining a database of unknown events to update policy document for use by a system administrator in a computer system, comprising:
- 24. A computer program product, comprising:
a computer readable medium; and program code on said computer readable medium for implementing automatic interpretation of system data utilized to create a policy document, said program code comprising code for: filtering through each unknown event in a database of unknown events; selectively determining which unknown events among all of the unknown events in a database should be considered for inclusion in said policy document; and updating the policy document with event examples associated with the unknown events.
- 25. The computer program product of claim 24, further comprising program code for:
analyzing said unknown events within said database; and identifying trends within said database of unknown events based on a historical treatment of said unknown events;
- 26. The computer program product of claim 24, further comprising program code for outputting suggested elements from said database for inclusion in said policy documents.
- 27. The computer program product of claim 24, wherein said database is a text file configured with a single event per line of text.
- 28. The computer program product of claim 27, further comprising program code for:
converting said text to a code form that is readable by a clustering algorithm; and passing said code form to said clustering algorithm.
- 29. The computer program product of claim 28, further comprising program code for:
grouping similar elements of said database into subgroups; and generating a feature vector for said subgroups with an associated set of valid values.
- 30. The computer program product of claim 28, wherein said program code for said converting step converts events in the database into examples for said clustering algorithm.
- 31. The computer program product of claim 30, wherein said program code for said converting feature comprises a software parser function, which includes program code for:
dividing the database into tokens, wherein each line within said database is split into words separated by a blank space; collecting a single occurrence of each token, wherein duplication of tokens within a collected dictionary of tokens are substantially avoided; converting each line of said database into an example vector; and collecting output examples into event examples.
- 32. The computer program product of claim 31, further comprising program code for:
determining the line with a greatest number of tokens; assigning the number of tokens of said line as the length of each token vector to be utilized to convert events of the database into examples for said clustering algorithm, wherein said example vector each has a length equal to said length;
- 33. The computer program product of claim 31, further comprising program code for:
combining similar event examples into clusters; outputting said clusters to a user; and enabling user manipulation of parameters of said clustering algorithm to produce different clusters.
- 34. The computer program product of claim 32, further comprising program code for replacing IP addresses with surrogate general expressions.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application shares specification text and figures with the following co-pending application, which was filed concurrently with the present application: application Ser. No. ______ (Attorney Docket Number AUS920020519US1) entitled “Developing and Assuring Policy Documents Through a Process of Refinement and Classification,” the entire contents of which are hereby incorporated by reference.