The present invention is described in detail below with reference to the attached drawing figures, wherein:
The subject matter of the present invention is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the term “step” may be used herein to connote different elements of methods employed, the term should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Referring initially to
The invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, specialty computing devices, servers, etc. The invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
Referring now to
Embodiments of the invention provide searching for relevant data by permitting search results to be displayed to a user 112 in response to a user-specified search request (e.g., a search query). In one embodiment, the user 112 uses the client 102 to input a search request including one or more terms concerning a particular topic of interest for which the user 112 would like to identify relevant electronic documents (e.g., Web pages). For example, the front-end server 106 may be responsive to the client 102 for authenticating the user 112 and redirecting the request from the user 112 to the back-end server 108.
The back-end server 108 may process a submitted query using the index 110. In this manner, the back-end server 108 may retrieve data for electronic documents (i.e., search results) that may be relevant to the user. The index 110 contains information regarding electronic documents such as Web pages available via the Internet. Further, the index 110 may include a variety of other data associated with the electronic documents such as location (e.g., links, or URLs), metatags, text, and document category. In the example of
A search engine application (application) 114 is executed by the back-end server 108 to identify web pages and the like (i.e., electronic documents) in response to the search request received from the client 102. More specifically, the application 114 identifies relevant documents from the index 110 that correspond to the one or more terms included in the search request and selects the most relevant web pages to be displayed to the user 112 via the client 102.
The information gathered by the web crawler 202 and received by the feeds 204 may be submitted to an index builder 206. The index builder 206 may perform a variety of tasks necessary to index and store the information. For example, the index builder 206 includes a page classifier 208. The page classifier 208 may be configured to assign classification tags to the various documents received from the web crawler 202 and the feeds 204. In one embodiment, Web pages received from the web crawler 202 may be divided into a variety of subclasses based on a page's content. For example, Web pages with buying controls (e.g., “Buy buttons”) may allow the page to be tagged with a transactional tag. As another example, pages may offer information about a local business, restaurant or service. These pages may be tagged with a “local” tag to indicate a regional relevance for the page. Indeed, a wide variety of classification tags may be used by the page classifier 208 to divide the pages by type. In one embodiment, data is extracted from a Web page for evaluation by the page classifier 208. Using statistical models, the page classifier 208 may leverage a rule set in association with support vector machines to determine the tags to be associated with the Web pages. As will be appreciated by those skilled in the art, a variety of techniques exist for classifying documents with statistical models.
The index builder 206 also includes an entity extractor 210, which is configured to generate metadata from information extracted from the tagged documents. In one embodiment, the extracted metadata is dependent upon the page's type (i.e., which classification tags have been assigned to the page). For example, a page may describe a particular product and be tagged as a “product” page. The extracted metadata for such a product page may include the price, product name, image and other salient attributes present on the page. As a further example, a “reviews” page may extract a rating and a summary for various reviewed products/content. In one embodiment, for each type of document, the entity extractor 210 builds a visual DOM (Document Object Model) tree that can identify records on a page and cluster across these records to identify and extract common fields. In this manner, a format (or structure) for the metadata may be generated for the various document types. As will be appreciated by those skilled in the art, by gleaming metadata from documents based on the document type, the metadata may be tailored to maximize usefulness to a user evaluating search results.
The classification tags and the metadata may be stored along with the copies of the documents in an index 212. The index 212 may contain a variety of data associated with the electronic documents, such as document text, location, metadata, text, and tags. In short, the index 212 may contain data useful for a search operation to identify documents relevant to a query.
In one embodiment, the index 212 may include tags representing a one or more confidence measures for indicating how useful a page is to one or more respective user intents. These tags may be the classification tags generated by the page classifier 208 and/or may be generated with reference to the classification tags and the metadata. For example, a “research” intent may be associated with a document containing a product's review and metadata associated with this review. As another example, the index 212 may store a tag indicating a “shopping” intent with a document having a “buy” button and metadata indicating pricing information. As demonstrated by these examples, the intent tags do not necessarily define the content of a document. Rather the intent tags generally relate to how a document is likely be used by a user. As will be appreciated by those skilled in the art, a variety of intent-based tags and formatted metadata may reside along with the documents in the index 212.
The system 200 also includes a search component 214. The search component 214 is configured to receive a user search input 216 and to interact with the index 212 so as to identify a set of relevant documents responsive to the search input 216. Because the index 212 provides metadata and tags indicating an association between documents and potential user intents derived from the documents, the search component 214 may leverage this intent-based information. For example, the search component 214 may aggregate (i.e., group) the various documents by their related intents. In this manner, the intent tags in the result set may be identified, and the search component 214 may determine how well various results serve user intent in different situations.
The search component 214 may further be configured to generate a presentation for display to the user. This presentation may be presented by a presentation component 218. In one embodiment, the presentation is presented via the Internet as a Web page. Because the search input 216 may not adequately indicate a user's intent when making the query, the presentation may include visual elements to aid the system 200 in identifying such user intent.
In one embodiment, the user may be presented with metadata from documents associated with various intents. Further, the user may be presented actions that may be performed with regard to the presented results. These actions may be a function of a page's type and available metadata. For example, “Get directions to this business” may be an available action for a page identified as a “local business.” The presentation may also include elements that explicitly identify potential intents. For example, the presentation may list intents for user selections. In one embodiment, the presentation may ask, “Are you looking to Shop, Research or For Local Listing?” By exposing actions and controls, the presentation offers hints as to what additional tools and services are available. In this manner, the system 200 may cluster actions and types by intent and present controls that allow the user to efficiently indicate their content of interest.
The system 200 also includes an intent determination component 220 for determining the user's intent. The intent determination component 220 may determine which of the identified intents most accurately matches a user's search query. Such a determination may be made based on user inputs to the displayed presentation. For example, the search input 216 may include the term “mouse.” In this instance, the identified intents may relate to a computer mouse and to an animal mouse. The user may select a visual element indicating their intended interest is a computer mouse. Accordingly, the intent determination component 220 may infer that the search term “mouse” relates only to a computer mouse, not any animals. Such an identified intent may be communicated to the search component 214 so that different results and rankings can be exposed based on this intent. Further, targeted metadata, actions and advertisements may be presented by the presentation component 218 based on the identified intent.
In one embodiment, the intent determination component 220 refines the identified intent as the user continues to interact with the system. Based on the tags in the results set, a vertical search experience may be suggested to the user. A vertical search experience is a search over a subset of documents with a clear commonality. Since the search is scoped to documents of a certain type, additional features and functionality that leverage that commonality can be added to make it easier for the user to narrow their field of interest. For example, a user expressing an intent to purchase a car may be interested in either purchasing a used car from an Internet dealer, finding the address of a new car dealer in their area or searching classified ads. The intent determination component 220 may seek to determine which of these options (or more specific intents) the user desires. Once the intent is further refined, the search component 214 may provide the user the correct organized, vertical search experience. As will be appreciated by those skilled in the art, by providing an interface that allows the user to identify their intent and by leveraging the intent-based data in the index 212, the system 200 can capture the user's intent in a guided fashion and then provide a search experience with content, tools and ads targeted to that intent.
At 304, the method 300 extracts information from the electronic documents. For example, the extracted information may serve as metadata accompanying the electronic documents in a file store or an index. A variety of information may be extracted at 304. In one embodiment, the extracted information is selected based on a document's classification tags. In this embodiment, the extracted metadata may be formatted in accordance with the content available on the Web page. For example, a tag may indicate that a Web page contains a job listing. For each of such Web pages, the extracted metadata may include the job title and salary range. So the most salient information for job seekers may be stored as metadata along with a job listing Web site. The method 300, at 306, stores the documents in an index along with the extracted information and/or the classification tags.
Once the set of responsive documents are generated, the method 400 aggregates the tags associated with the responsive documents at 404. In one embodiment, these tags may represent the potential intents of the user when making the query. Based on these tags, it may be determined how well the responsive documents serve a user's intent in different situations. For example, various documents in the result set may have tags indicating a strong relevance to serving a user that intends to purchase a certain product.
The method 400, at 406, displays visual elements to the user. Any number of visual elements relevant to the search results may be displayed. In one embodiment, the aggregated tags are used in the selection of these elements. For example, the user may be presented elements associated with the aggregated tags. By selecting a visual element, the user may indicate their intended content of interest. For example, the user may be presented a listing of various tags for selection, and the listing might correspond to tags in the result, including possibly a subset of the aggregated tags. The user may also be presented search results, actions and/or metadata relevant to a portion of the tags.
User interaction with such visual elements may be used to determine the user's intent and, at 408, the method 400 receives a user's selection of a visual element. Based on this selection, the method 400 may assign an intent to the search query at 410. For example, a user may submit a search query with the term “Apple.” The visual elements presented in this example may relate to both Apple computers and the fruit apple. User selection of an element associated with the fruit apple will indicate the user's desire to view information on the fruit apple, not on an Apple computer. As will be appreciated by those skilled in the art, by exposing various results, controls and action corresponding to different potential user intents, the user may be afforded the ability to indicate their actual intent.
Based on the identified intent, the method 400, at 412, generates or refines targeted results for presentation to the user. In one embodiment, the presented results and/or their ranking depend on the identified intent. Further, the exposed metadata, controls and advertisements may also be targeted to the identified intent. Returning to the apple example, the user may be presented a variety of search results relating to fruit apples, and/or advertisements for fruit apples might be presented. The various visual elements in this presentation may be designed to further refine the user's intent. For example, various results may address the health benefits of eating apples, while other results may provide retailers selling apples. Upon user interaction with the results, the method 400, at 414, can further refine the results by identifying a more narrowly-tailored intent. In this manner, the user may be guided into a vertical search scenario allowing for a structured approach to efficiently locate desired and useful content.
Alternative embodiments and implementations of the present invention will become apparent to those skilled in the art to which it pertains upon review of the specification, including the drawing figures. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description.