Index replication using crawl modification information

Information

  • Patent Application
  • 20070208716
  • Publication Number
    20070208716
  • Date Filed
    February 23, 2007
    17 years ago
  • Date Published
    September 06, 2007
    17 years ago
Abstract
Systems, methodologies, media, and other embodiments associated with index replication using crawl modification information are described. One exemplary system embodiment includes an enterprise search system comprising a target search system comprising an index logic that uses modified crawl information related to items associated with sources to maintain an index that supports searching of the items; and, a crawl search system comprising a pipeline processor configured to receive modified crawl information related to the items and to propagate the modified crawl information to the target system.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate various example systems, methods, and other example embodiments of various aspects of the invention. It will be appreciated that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the figures represent one example of the boundaries. One of ordinary skill in the art will appreciate that one element may be designed as multiple elements or that multiple elements may be designed as one element. An element shown as an internal component of another element may be implemented as an external component and vice versa. Furthermore, elements may not be drawn to scale.



FIG. 1 illustrates an example enterprise search system.



FIG. 2 illustrates another example enterprise search system.



FIG. 3 illustrates an example distributed enterprise crawl system.



FIG. 4 illustrates an example method for replicating search information.



FIG. 5 illustrates an example method for replicating search information.



FIG. 6 illustrate an example method for replicating search information.



FIG. 7 illustrates an example method for replicating search information in a distributed crawl environment.



FIG. 8 illustrates an example computing environment in which example systems and methods illustrated herein can operate.


Claims
  • 1. An enterprise search system, comprising: a target search system comprising an index logic that uses modified crawl information related to items associated with sources to maintain an index that supports searching of the items; and,a crawl search system comprising a pipeline processor configured to receive modified crawl information related to the items and to propagate the modified crawl information to the target system.
  • 2. The enterprise search system of claim 1, where the index functionally replicates a primary index of the crawl search system.
  • 3. The enterprise search system of claim 2, where the index and the primary index employ different storage mechanisms.
  • 4. The enterprise search system of claim 1, further comprising a plurality of target search systems, where the index of each target system independently functionally replicates a primary index of the crawl search system.
  • 5. The enterprise search system of claim 1, further comprising a plurality of target search systems, where the indexes of the target systems collectively functionally replicate a primary index of the crawl search system.
  • 6. The enterprise search system of claim 1, where the items comprise at least one of documents, files, web pages, emails spread sheets and/or databases.
  • 7. The enterprise search system of claim 1, where the modified crawl information comprising at least one of modified content, metadata and/or security information.
  • 8. The enterprise search system of claim 1, further comprising a crawler logic configured to access the items to determine a modification to the items, the crawler logic further configured to provide modified crawl information to the pipeline processor.
  • 9. An enterprise search system, comprising: a plurality of target search systems, each target search system comprising an index logic that uses modified crawl information related to items associated with sources to maintain an index that supports searching of the items;a crawl search system comprising: a pipeline processor configured to receive modified crawl information related to the items and to propagate the modified crawl information to the plurality of target systems; and,a crawler logic configured to access the items to determine a modification to the items, the crawler logic further configured to provide modified crawl information to the pipeline processor.
  • 10. The enterprise search system of claim 9, where each of the plurality of target search systems is configured to process only a particular type of item.
  • 11. The enterprise search system of claim 9, where each of the plurality of target search systems is configured to process only items associated with a particular source.
  • 12. The enterprise search system of claim 9, where each of the plurality of target search systems is configured to process only items associated with one or more entities.
  • 13. The enterprise search system of claim 9, where the index of each target system independently functionally replicates a primary index of the crawl search system.
  • 14. The enterprise search system of claim 9, where the indexes of the target systems collectively functionally replicate a primary index of the crawl search system.
  • 15. A distributed enterprise crawl system, comprising: a first search system and a second search system, each search system comprising: an index logic that uses modified crawl information related to items associated with sources to maintain an index that supports searching of the items;a pipeline processor configured to receive modified crawl information related to the items and to propagate the modified crawl information to the other search system; and,a crawler logic configured to access the items to determine a modification to the items, the crawler logic further configured to provide modified crawl information to the pipeline processor and to the index of the particular search system,where the first search system is configured to access items associated with a first source and the second search system is configured to access items associated with a second source.
  • 16. The distributed enterprise crawl system of claim 15, where the index of each search system independently functionally replicates the index of the other search system.
  • 17. A method for replicating search information, comprising: crawling items associate with sources;identifying changes to the items;broadcasting information regarding the identified changes to a plurality of target search systems; and,independently updating an index associated with each target search system based on the identified changes.
  • 18. The method of claim 17, where each of the plurality of target search systems is configured to process only a particular type of item.
  • 19. The method of claim 17, where each of the plurality of target search systems is configured to process only items associated with a particular source.
  • 20. The method of claim 17, where each of the plurality of target search systems is configured to process only items associated with one or more entities.
  • 21. The method of claim 17, where the index of each target system independently functionally replicates a primary index of a crawl search system.
  • 22. The method of claim 17, where the indexes of the target systems collectively functionally replicate a primary index of a crawl search system.
  • 23. A computer-readable medium for providing processor executable instructions that when executed cause a computer to perform a method for replicating search information in a computing system comprising a plurality of search systems and a plurality of crawlers, the method comprising: distributing crawling assignments of a plurality of information sources between the plurality of crawlers;crawling and identifying, by each of the crawlers, changes to items from one or more assigned information sources;broadcasting, by each of the crawlers, the identified changes to the plurality of search systems; andindependently updating, by each of the search systems, an index to the plurality of information sources using the identified changes broadcasted from each of the crawlers.
Provisional Applications (2)
Number Date Country
60777988 Mar 2006 US
60853487 Oct 2006 US