Crawler based auditing framework

Information

  • Patent Application
  • 20070226695
  • Publication Number
    20070226695
  • Date Filed
    January 03, 2007
    17 years ago
  • Date Published
    September 27, 2007
    16 years ago
Abstract
Systems, methods, and other embodiments associated with post-crawl auditing are described. One system embodiment includes an audit logic that can be controlled to apply an audit rule to crawl data. The crawl data may be acquired by a crawl logic that provides the crawl data to an index logic. The crawl logic may be configured to crawl documents stored in different locations in an enterprise. The crawl logic may also be configured to crawl documents having different formats. The index logic may be configured to create an index that supports searching for documents in the enterprise. The audit logic may process the crawl data independent of the operation of the index logic.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate various example systems, methods, and other embodiments of various aspects of the invention. It will be appreciated that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the figures represent one example of the boundaries. One of ordinary skill in the art will appreciate that in some embodiments one element may be designed as multiple elements, multiple elements may be designed as one element, an element shown as an internal component of another element may be implemented as an external component and vice versa, and so on. Furthermore, elements may not be drawn to scale.


Prior Art FIG. 1 illustrates an enterprise search system.



FIG. 2 illustrates a portion of an example enterprise search system having post-crawl auditing functionality.



FIG. 3 illustrates an example enterprise search system having post-crawl auditing functionality.



FIG. 4 illustrates an example method associated with enabling post-crawl auditing.



FIG. 5 illustrates an example method associated with post-crawl auditing.



FIG. 6 illustrates an example computing environment in which example systems and methods illustrated herein may operate.


Claims
  • 1. A system, comprising: an audit logic to apply an audit rule to a crawl data to determine a state of a document with respect to compliance with an audit standard,where the crawl data is provided by a crawling logic configured to access a plurality of documents stored on a plurality of repositories, and where members of the plurality of documents may have different document types; anda signal logic to provide a signal based, at least in part, on the state of the document with respect to compliance with the audit standard.
  • 2. The system of claim 1, the audit logic being configured to selectively apply an audit rule based, at least in part, on a dynamically configurable parameter.
  • 3. The system of claim 2, the dynamically configurable parameter being related to one or more of, a user input, a schedule, a volume of crawl data observed, and a type of crawl data observed.
  • 4. The system of claim 1, the audit logic being configured to receive crawl data that includes a normalized set of data that includes one or more members relevant to a plurality of document types.
  • 5. The system of claim 1, the audit logic being configured to receive crawl data that includes document content.
  • 6. The system of claim 5, the audit logic being configured to receive crawl data that includes metadata concerning a document.
  • 7. The system of claim 6, the audit logic being configured to receive metadata that includes one or more of, a document modification time, a globally unique identifier (GUID) that identifies a modifier of a document, a document URL (Uniform Resource Locator), and a data source associated with a document.
  • 8. The system of claim 6, the audit logic being configured to receive crawl data that includes security data.
  • 9. The system of claim 8, the audit logic being configured to receive security data that includes one or more of, an access control entry (ACE), an access control list (ACL), and a security attribute.
  • 10. The system of claim 1, where the audit standard concerns one or more of, document location, document security, document modification, document, repository, and document access.
  • 11. The system of claim 1, including a rules index that stores information concerning one or more audit rules.
  • 12. The system of claim 1, including the crawling logic, where the crawling logic can operate independent of the presence of the audit logic.
  • 13. The system of claim 12, including an index logic to operate independent of the presence of the audit logic, the index logic to use crawl data to maintain an index that supports query processing for documents, the audit logic to operate independent of the index logic.
  • 14. The system of claim 13, including a query logic that can operate independent of the presence of the audit logic, the query logic being configured to provide a query to the index logic, the index logic being configured to identify one or more documents relevant to the query.
  • 15. A system, comprising: a crawling logic to provide a crawl data, the crawling logic to access a plurality of documents stored on a plurality of repositories, where members of the plurality of documents may have different document types, where the crawl data includes a normalized set of data that includes one or more members relevant to a plurality of document types, the normalized set of data including document content, metadata concerning a document, and security data concerning a document;an index logic to use the crawl data to maintain an index that supports query processing for documents belonging to the enterprise;a query logic to provide a query to the index logic, the index logic being configured to identify one or more documents relevant to the query;an audit logic to apply an audit rule to the crawl data to determine a state of a document with respect to compliance with an audit standard, the audit logic being configured to selectively apply an audit rule based, at least in part, on a dynamically configurable parameter related to one or more of, a user input, a schedule, a volume of crawl data observed, and a type of crawl data observed,a rules index to store information concerning one or more audit rules; anda signal logic to provide a signal based, at least in part, on the state of the document with respect to compliance with the audit standard,the crawling logic, the index logic, and the query logic being configured to operate independent of the presence of each other.
  • 16. A method, comprising: operably connecting an audit logic to an enterprise search system that includes a crawler logic and an index logic, the crawler logic being configured to provide a crawl data to the index logic, the index logic being configured to maintain an index of documents belonging to an enterprise based, at least in part, on the crawl data; andcontrolling the audit logic to audit a document belonging to the enterprise by processing the crawl data without altering the operation of the crawler logic and without altering the operation of the index logic.
  • 17. The method of claim 16, where controlling the audit logic to audit a document includes controlling the audit logic to apply an audit rule to the crawl data.
  • 18. A method, comprising: accessing a data generated by an enterprise search system, where the data concerns a document belonging to an enterprise; andperforming an audit function on the document belonging to the enterprise by processing the data generated by the enterprise search system, where performing the audit function is performed independent of delivery to a recipient of the data generated by the enterprise search system.
  • 19. The method of claim 18, where accessing the data generated by the enterprise search system includes accessing data generated by a crawler that is configured to crawl a plurality of types of documents that may be stored in a plurality of repositories within the enterprise.
  • 20. The method of claim 19, where performing the audit function includes one or more of, comparing a modification date for a document to an audit standard concerning modification dates, and comparing an access time for a document to an audit standard concerning access times.
  • 21. The method of claim 20, where performing the audit function includes one or more of, comparing an identity for a user who accessed a document to an audit standard concerning document access, comparing a relocation event for a document to an audit relocation standard, and comparing a relocation destination for a document to the audit relocation standard.
  • 22. The method of claim 19, where performing the audit function includes applying a rule to the data generated by the enterprise search system.
  • 23. The method of claim 22, where applying a rule to the data generated by the enterprise search system includes selecting a rule from a rules index, where a term in the data generated by the enterprise search system indexes into the rules index.
  • 24. The method of claim 18, including controlling the enterprise search system to perform an additional search based, at least in part, on the results of performing the audit function on the document belonging to the enterprise.
  • 25. A machine-readable medium having stored thereon machine-executable instructions that if executed by a machine cause the machine to perform a method, the method comprising: accessing a data generated by a crawler that is configured to crawl a plurality of types of documents that may be stored in a plurality of repositories within an enterprise, where the data concerns a document belonging to the enterprise; andperforming an audit function on the document belonging to the enterprise by processing the data generated by the crawler, where performing the audit function is performed independent of delivery to a recipient of the data generated by the crawler, and where performing the audit function includes one or more of, comparing a modification date for a document to an audit standard concerning modification dates, comparing an access time for a document to an audit standard concerning access times, comparing an identity for a user who accessed a document to an audit standard concerning document access, comparing a relocation event for a document to an audit relocation standard, and comparing a relocation destination for a document to the audit relocation standard,where performing the audit function includes applying a rule to the data generated by the crawler, and where applying a rule to the data generated by the crawler includes selecting a rule from a rules index, where a term in the data generated by the crawler indexes into the rules index.
  • 26. A system, comprising: means for accessing normalized data produced by an enterprise crawler configured to crawl a plurality of document types stored in a plurality of locations within an enterprise;means for maintaining an index of documents belonging to the enterprise, where the maintaining depends on the normalized data; andmeans for auditing a document in the enterprise by processing the normalized data, where auditing the document does not interfere with maintaining the index of documents.
Provisional Applications (2)
Number Date Country
60777988 Mar 2006 US
60853507 Oct 2006 US