The present invention relates generally to text based searches and more particularly to providing text based searches for a self-contained document.
The primary mechanism used to locate documents on a web site is text based search. Typically the documents returned in the search application contain links to static documents or a landing page. There are a number of instances where a mediating process is needed to connect disparate web resources and functions to seamlessly maintain the user's context when performing a task. For example, there are on-line libraries which are built as self-contained document repositories. It is advantageous for users to be able to search multiple libraries from a single point in a large web site and find relevant information in any of these libraries. Unfortunately, many of these libraries have their own search engines, unique document retrieval mechanisms, and closed internal structures. As such, they are not easily federated into a single search.
A large organization may have a number of these libraries published on their web site. Devising a single search application that both searches the content of these separate libraries and is able to retrieve the pages where hits are detected is problematic. While it is possible to extract the text content from these libraries and index them in one search engine, it may not be possible to navigate and retrieve specific information via a static HTML link to the library. While the search engine can find a text match, it cannot localize the hit to a single page and make that page retrievable as a link when the library is a closed single access point entity.
Accordingly, what is needed is a system and method for allowing search engines to easily access information in self-contained documents. The system and method should be easily implemented utilizing existing search tools, should be cost-effective and adaptable to existing computing systems. The present invention addresses such a need.
A method for a text based search is disclosed. The method comprises providing search results which include at least one indexed self contained document. The method also includes obtaining a search term from a previous URL of the at least one self contained document by a broker, the broker including the unique identifier (UID) of the at least one self contained document. Additionally, the method includes mapping the UID to an actual URL of the at least one included self contained document by the broker. Finally, the method comprises linking to the at least one self contained document, by the broker.
A system and method in accordance with the present invention includes a web based (non visual) mediator/broker which has logic to acquire information necessary to pass requests to another web resource via parameters passed in the URL and/or by retrieving the previous URL and examining its content. In the current implementation, the broker is used in conjunction with a search engine containing a fully text indexed document which represents an entire Information Center. The document is indexed with a URL that includes a unique identifier (UID) as a parameter. This URL points to the broker and the UID is mapped to a specific Information Center.
The present invention relates generally to text based searches and more particularly to providing text based searches for a self-contained document. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiments and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features described herein.
A method and system in accordance with the present invention allows search engines to search self contained/closed document repositories more effectively as well as offering a general solution to integrate search link results with any resource that is programmatically addressable as a web source. Further, a method and system in accordance with the present invention serves as an intelligent broker which can be programmed with logic to conditionally provide different links or parameter inputs based on information passed to the broker and/or information the broker can acquire from the calling party and its environment.
In a preferred embodiment, a broker is utilized in the form of a JSP or an ASP that can programmatically invoke an API of another self contained document. The broker maps incoming requests to a web resource via conditionally invoking preprogrammed mappings. To describe the features of the present invention in more detail, refer now to the following description in conjunction with the accompanying figures.
Environment
Indexing the Self Contained Information
Although an entire document file can be placed on a single XML file, some indexers may require smaller files. Accordingly, it may be desirable to break them into smaller sizes for use by the indexer. The process to index the self-contained information requires an administrator and a self contained information/Information Center owner. To describe this function refer now to the following description in conjunction with the accompanying figures.
The Administrator
The role of the administrator is to provide Information Center owners with the tools and a process to index the Information Center. Specifically the administrator is responsible for:
1. Ensuring that each Information Center has a unique identifier.
2. Configuring the broker to redirect link requests to the owner's published Information Center URL.
3. Providing the Information Center owner (or their representatives) with access/authorization to the indexing tool.
4. Enforcing any policies on when indexing can occur and any file size limitations.
The administrator can be a person, an automated process or any other mechanism to perform these tasks. In a preferred embodiment, an individual performs the tasks.
Updating the Broker
As mentioned above, the broker 102′ must be updated before the indexing tool (not shown) is configured for a new Information Center 408. It is the responsibility of the administrator (not shown) to assign a unique UID for each Information Center 408. for example, a naming convention could be IC0000X where the first Information Center is IC00001 and the UIDs are assigned by increasing the value like IC00002, IC00003, etc.
The broker 102 as above-described is an intelligent mediator which allows for a first search engine to quickly access a properly indexed self-contained document (i.e., Information Center). The broker 102 has logic therewithin which allows for this functionality.
(a) Accepts URL requests.
(b) Contains logic to parse URLs for parameters.
(c) Contains logic (can be coded) to acquire previous URL to detect search queries or parameter.
(d) Contains logic to conditionally map request for forwarding/redirecting to another URL location based on a variety of conditions and states including passed parameter values, previous URL, or system state.
(e) Contains logic to conditionally modify request when forwarding to another URL including passing data parameters such as search terms or passing parameters serving as commands to a web resource.
(f) Maintains a list of mapping relations between a passed parameter value(s) and an associated URL location to transfer or redirect the request.
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.