This application is related to the co-pending and co-assigned applications entitled “TWO-DIMENSIONAL CONDITIONAL RANDOM VIEWS FOR WEB EXTRACTION,”, filed on Dec. 16, 2005 and assigned Ser. No. 11/304,500, “METHOD AND SYSTEM IDENTIFYING OBJECT INFORMATION,”, filed on Apr. 14, 2005 and assigned Ser. No. 11/106,383 and “METHOD AND SYSTEM FOR RANKING OBJECTS OF DIFFERENT OBJECT TYPES,”, filed on Apr. 14, 2005 and assigned Ser. No. 11/106,017. The above-noted applications are incorporated herein by reference.
Communication networks, such as the Internet, allow users from different locations to access data from anywhere in the world. Because of the vastness of the amount of information, users typically employ search engines to find relevant information. This allows the vast amounts of data to be easily accessible to users in any location by simply entering a search query. Results of the query are then returned to the user in a search result list. Typically, these lists are “flat” or one dimensional. In other words, the search results are ranked solely on the search query entered by the user.
The usefulness of such a search result list is dependent on several factors—adequacy of the search string (i.e., is this really what the user is interested in), accessibility of relevant data by the search engine, proper relevancy ranking of the data by the search engine. Thus, a poorly worded search string will not return favorable results to a user. And, even if properly worded, if the search engine does not have access to relevant data, the search results will be less than effective. If access is available, but the search engine lists the search results in a large one-dimensional list according to a single relevancy, the user may become overwhelmed and be dissatisfied with the search results. Users generally prefer a search engine that can return relevant data quickly, efficiently, and in an easily readable format. However, search engines do not generally provide relevancy flexibility in the presentation of the search results.
Search results are ranked utilizing multiple bases of relevancy. This allows search result lists to be further refined into relevant groupings. The ‘group-by’ parameters are derived from search result attributes. Attribute values derived from the attributes are then utilized in a ranking scheme to further group the search results based on attribute value relevancy. The grouped search results can then be displayed to users via a search result page. In one instance users can select which attribute value is used to group the search result list. This gives the user substantial control over relevancy groupings within the search result list. Ranking processes are based on object ranking algorithms that consider each attribute value as an object type. Some instances provide for search result list condensing of groupings based on relevancy of the attribute values as well. Although applicable to an infinite amount of search results, a top-k instance can be employed to limit the search results to bound the amount of time required for processing search result lists. By grouping search results based on attribute values, users are provided with an organizational means to control formatting and presentation of search results based on further relevancy in a secondary aspect to their original search.
The above presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of embodiments are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the subject matter may be employed, and the subject matter is intended to include all such aspects and their equivalents. Other advantages and novel features of the subject matter may become apparent from the following detailed description when considered in conjunction with the drawings.
The subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject matter. It may be evident, however, that subject matter embodiments may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the embodiments.
As used in this application, the term “component” is intended to refer to a computer-related entity, either hardware, or a combination of hardware and software. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, and/or a computer. By way of illustration, both an application running on a server and the server can be a computer component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
A result set for a given search term in traditional internet search engines is typically a flat list. However, this assumes that a user is only interested in one aspect of the search results. This is generally not the case, and, thus, it is beneficial to an end user to group search result sets by some additional aspect based on attributes of the search results. For example, a general search based on “jokes” can be grouped by writers, a shopping search based on “digital cameras” can be grouped by brands, and an academic paper search based on “data mining” can be grouped by authors and the like. Instances provided herein include methods that produce ‘group-buy’ search result listings. For example, popular attribute values can be utilized with object ranking that ranks attribute values by dynamic ranks of search results possessing these attribute values. Group-by search results can then be displayed on web pages according to their attribute values. In some instances, several results can follow each attribute value in a web page.
In
The attribute grouping component 102 takes the search result attribute information 104 and utilizes it to group search results 108 based on the attribute values to form the grouped search results 106. The attribute grouping component 102 can utilize various algorithms to accomplish the ranking of attribute values. Typically, the search results 108 are ranked according to a general relevancy standard in a flat search result list. This ranking can be employed along with the attribute value to form a preliminary sorting list of results. The attribute values are then ranked and employed to further sort the search results 108 to construct the grouped search results 106. The processes involved with performing the sorting is detailed infra.
The group-by sorting of the search results 108 allow the group-by search result system 100 to provide users with information in a format that provides additional inherent data information. A user can almost instantly glean information from the presented format that normally would take additional searches, or data mining, to discover. For example, the scholarly academic paper search can yield two significant authors with 50 papers each listed in the grouped search results 106. The user can easily deduce that these authors are significant contributors to this academic arena and also easily peruse their works. If a similar search showed 100's of authors with a single paper, it could be deduced that there are no single significant contributors to this area of knowledge. Thus, the user gains more from the experience of utilizing the group-by search result system 100 than just the convenience of having an author's papers grouped together. Therefore, users of the group-by search result system 100 have a significant advantage over users of traditional search engines that return flat search result lists.
Looking at
The attribute value ranking component 204 can also accept system and/or user attribute preferences 216. The system and/or user attribute preferences 216 can include, but are not limited to, desired attributes and/or attribute values and the like. This allows a system and/or user to influence which attributes are utilized by the attribute value ranking component 204 and, thus, subsequently influence the grouped search results 214. The attribute value ranking component 204 sorts the search results 218 based on their associated ranking, resorts based on their associated attribute values, ranks the attribute values, and then applies the attribute value ranking to the search results 218. This yields groups of search results 218 that are based on their associated attribute values. The attribute utilized by the attribute value ranking component 204 can, as stated previously, change based on system and/or user input and the like.
The search result display component 206 receives the group-by ranking from the attribute value ranking component 204 and formats them for relaying to a user as grouped search results 214. The relaying to the user typically consists of visual representations that allow a user to easily comprehend the groupings and, thus, the attributes and their values by a user. This can include offsetting attribute values relative to an attribute, incorporating color schemes to highlight attributes from their values, and/or other schemes to relay information to the user and the like. The search result display component 206 can also incorporate non-visual relaying to a user. This can be accomplished utilizing aural information and/or other sensory information and the like. Thus, in one instance, the grouped search results 214 can be read to a user and the like. In another instance, the grouped search results 214 can be presented in a Braille format to a user and the like. The relaying of the information by the search result display component is not limited to only those listed herein.
Turning to
Referring to
The algorithm for calculating the attribute value rank is referred to as “object ranking,” which means that each attribute value can be treated as an object and, thus, the rank of this object can be calculated. One object ranking algorithm that can be utilized for attribute value ranking is Eq. 1 where:
Sattr=(R1, R2, . . . , Rk)
Rattr=f(Sattr)
where R1, . . . Rk are dynamic ranks of results which have an attribute value “attr.” The f(Sattr) can be any combination function. For example:
where c is a constant float number (e.g., scaling factor) that can be varied to emphasize and/or de-emphasize a ranking value.
In one instance, a group-by search result process returns a list of attribute values sorted by descending attribute value rank. For each attribute value, there is typically several results which have this attribute value. Thus, some of these values can be condensed to provide a top-k search result list. In TABLE 1, below, an example sorting process is described.
“Result.root” points to a result which has the same attribute value and the highest rank. In
In the related and cross-referenced application Ser. No. 11/106,383, published Oct. 19, 2006 as No. 2006/0235875, and Ser. No. 11/304,500, published on Jun. 28, 2007 as No. 2007/0150486, both to Ji-Rong Wen et al, an object oriented search engine can benefit from identifying and labeling object information of an information page.
In one aspect of the object oriented search engine, the information extraction system identifies the object blocks of an information page. An object block is a collection of information that relates to a single object. For example, an advertisement for a camera may be an object block and the matching object is the uniquely identified camera. The extraction system classifies the object blocks into object types. For example, an object block that advertises a camera may be classified as a product type, and an object block relating to a journal paper may be classified as a paper type. Each object type has associated attributes that define a schema for the information of the object type. For example, a product type may have attributes of manufacturer, model, price, description, and so on. A paper type may have attributes of title, author, publisher, and so on. The extraction system identifies object elements within an object block that may represent an attribute value for the object. For example, the object elements of an advertisement of a camera may include manufacturer, model, and price. The extraction system may use visual features (e.g., font size and separating lines) of an information page to help identify the object elements. After the object elements are identified, the extraction system attempts to identify which object elements correspond to which attributes of the object type in a process referred to as “labeling.” For example, the extraction system may identify that the object element “Sony” is a manufacturer attribute and the object element “$599” is a price attribute. The extraction system uses an algorithm to determine the confidence that a certain object element corresponds to a certain attribute. The extraction system then selects the set of labels with the highest confidence as being the labels for the object elements. In this way, the extraction system can automatically identify information of an object.
In one aspect of the object oriented search engine, the extraction system uses an object data store to assist in labeling the object elements. An object data store may contain an entry for each object of a certain object type. For example, a product data store may contain an entry for each unique product. Each entry of a product data store contains the attribute values for the attributes of the object to which the entry corresponds. For example, an entry for a camera may have the value of “Sony” for its manufacturer attribute. The object data store may be a pre-existing data store, such as a product database, or may be created dynamically as the extraction system identifies objects. When determining the confidence in a labeling of an object element, the extraction system may compare that object element to the attribute values within the object data store. For example, the extraction system may determine that the object element “Sony” is more likely a manufacturer attribute because it matches many of the attribute values of the manufacturer attribute in the product data store. The labeling of one object element may depend on the labeling of another object element. For example, if the extraction system is confident that the object element “Sony” is a manufacturer attribute, then the extraction system may not label any other object element with the manufacturer attribute. The extraction system may use feature functions defined for a specific object type that score the likelihood that an object element corresponds to a certain attribute.
In one aspect of the object oriented search engine, the extraction system may use the object elements with their labels to identify the object of the object data store to which the object elements correspond. For example, if the extraction system labels the object element “Sony” as a manufacturer attribute and the object element “DVS-V1” as a model attribute, then the extraction system may be able to identify an entry of the object data store that has the same attribute values. In such a case, the extraction system may assume that the object elements match the object of that entry. The extraction system can use the knowledge of the match to help label other object elements. For example, the knowledge of the matching object may be used to help identify the object element “CD-1” as a battery attribute. The extraction system can also update the information of the entry based on the object elements. For example, if an object element indicates that the price of the camera is $549.95 and the previous lowest price was $599, then the extraction system may update a lowest-price attribute and a corresponding vendor attribute. If the extraction system is unable to identify a matching object, then the extraction system may add a new entry to the object data store. The extraction system may assume a match between object elements and an entry when the object elements match on certain key attributes such as those that uniquely identify an object.
In view of the exemplary systems shown and described above, methodologies that may be implemented in accordance with the embodiments will be better appreciated with reference to the flow charts of
The embodiments may be described in the general context of computer-executable instructions, such as program modules, executed by one or more components. Generally, program modules include routines, programs, objects, data structures, etc., that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various instances of the embodiments.
In
Turning to
A user selectable attribute input is then provided for the grouped search result list 910. In some instances, the user selectable attribute input is a listing of possible attributes on a web page. The listing can have names of attributes that are clickable or otherwise selectable via an input means such as, for example, a mouse, keystroke, visual queuing system, and/or voice command and the like. Other instances can allow direct user input of attribute names in a text field and the like. Still other instances can allow other means of selection and/or input. The search result list is then regrouped based on the selected attribute when prompted 912, ending the flow 914. When a user (or even a system) selects a different attribute, the search results are resorted based on the selected attribute. In this manner, a user can effortlessly mine the search results for additional information. For example, a user can select ‘conferences’ and determine who attended (even though a search query was based on various paper topics) and then select ‘journals’ to see which authors publish on topics related to the search query and the like. These types of information can be easily obtained by utilizing group-by search result processing. This greatly increases the value of a search engine and substantially enhances user satisfaction.
Instances provided herein can utilize disparate locations to accomplish various methods and/or functions. Communications between these disparate entities can include global communication means such as the Internet and the like. Often this type of communication means utilizes server and client relationships.
In
What has been described above includes examples of the embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of the embodiments are possible. Accordingly, the subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
Number | Name | Date | Kind |
---|---|---|---|
5999664 | Mahoney et al. | Dec 1999 | A |
6047284 | Owens et al. | Apr 2000 | A |
6304864 | Liddy et al. | Oct 2001 | B1 |
6353825 | Ponte | Mar 2002 | B1 |
6385602 | Tso et al. | May 2002 | B1 |
6418434 | Johnson et al. | Jul 2002 | B1 |
6418448 | Sarkar | Jul 2002 | B1 |
6460036 | Herz | Oct 2002 | B1 |
6519585 | Kohli | Feb 2003 | B1 |
6601075 | Huang et al. | Jul 2003 | B1 |
6631369 | Meyerzon et al. | Oct 2003 | B1 |
6636853 | Stephens, Jr. | Oct 2003 | B1 |
6665665 | Ponte | Dec 2003 | B1 |
6813616 | Simpson et al. | Nov 2004 | B2 |
6847977 | Abajian | Jan 2005 | B2 |
6907424 | Neal et al. | Jun 2005 | B1 |
6931595 | Pan et al. | Aug 2005 | B2 |
6944612 | Roustant et al. | Sep 2005 | B2 |
6996778 | Rajarajan et al. | Feb 2006 | B2 |
7058913 | Siegel et al. | Jun 2006 | B1 |
7062488 | Reisman | Jun 2006 | B1 |
7231395 | Fain et al. | Jun 2007 | B2 |
7346621 | Zhang et al. | Mar 2008 | B2 |
7383254 | Wen et al. | Jun 2008 | B2 |
7720830 | Wen et al. | May 2010 | B2 |
20020174089 | Tenorio | Nov 2002 | A1 |
20020198875 | Masters | Dec 2002 | A1 |
20030093423 | Larason et al. | May 2003 | A1 |
20030115193 | Okamoto et al. | Jun 2003 | A1 |
20030177118 | Moon et al. | Sep 2003 | A1 |
20030212663 | Leno | Nov 2003 | A1 |
20040034652 | Hofmann et al. | Feb 2004 | A1 |
20040181749 | Chellapilla et al. | Sep 2004 | A1 |
20040194141 | Sanders | Sep 2004 | A1 |
20040199497 | Timmons | Oct 2004 | A1 |
20050108200 | Meik et al. | May 2005 | A1 |
20050144158 | Capper et al. | Jun 2005 | A1 |
20050171946 | Maim | Aug 2005 | A1 |
20050192955 | Farrell | Sep 2005 | A1 |
20060026152 | Zeng et al. | Feb 2006 | A1 |
20060031211 | Mizuno | Feb 2006 | A1 |
20060031214 | Solaro et al. | Feb 2006 | A1 |
20060036567 | Tan | Feb 2006 | A1 |
20060074881 | Vembu et al. | Apr 2006 | A1 |
20060080353 | Miloushev et al. | Apr 2006 | A1 |
20060098871 | Szummer | May 2006 | A1 |
20060101060 | Li et al. | May 2006 | A1 |
20060167928 | Chakraborty et al. | Jul 2006 | A1 |
20060253437 | Fain et al. | Nov 2006 | A1 |
20070033171 | Trowbridge | Feb 2007 | A1 |
20070150486 | Wen et al. | Jun 2007 | A1 |
20080027910 | Wen et al. | Jan 2008 | A1 |
Number | Date | Country |
---|---|---|
1158422 | Nov 2001 | EP |
0057311 | Sep 2000 | WO |
0073942 | Dec 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20080033915 A1 | Feb 2008 | US |