1. Field of the Invention
The present invention relates to techniques for detecting similar objects. More specifically, the present invention relates to a method and apparatus for performing inference detection based on web advertisements.
2. Related Art
Information retrieval (IR) is a science of searching for and retrieving information. One specific challenge for information retrieval systems is detecting inference between terms. It is important to advance the methods in order to leverage new information media.
Known inferences between terms can be leveraged in various applications, including Internet search engines and document categorization. For example, known inferences allow a plurality of documents to be categorized even when they do not share exact words or phrasing. In a second example, known similarities between a user's Internet search query and a given web page can facilitate ranking of search results when these results are relevant but not an exact match for the query term.
Much research has focused on developing new methods for measuring inference between terms by using hyperlinks or keyword overlap. However, these existing systems typically detect inference based on a variety of term attributes that are created for independent purposes, and therefore do not guarantee accurate correlation between the objects.
One embodiment of the present invention provides a system that performs inference detection based on Internet advertisements. In doing so, this system first receives a set of topic words, performs a search query on each topic word using a search engine, and gathers a set of Uniform Resource Locators (URLs) associated with sponsored advertisement from the search results corresponding to each search query. Then, the system determines a correlation between two topic words based on their corresponding URLs associated with sponsored advertisement, and produces a result which indicates groups of correlated topic words.
In a variation on this embodiment, the system determines a correlation between two topic words based on their corresponding URLs associated with sponsored advertisement by generating a bipartite graph based on the topic words and the URLs associated with sponsored advertisement. This bipartite graph includes a first set of vertices associated with one or more topic words, a second set of vertices associated with one or more URLs associated with sponsored advertisement, and a number of edges, wherein a respective edge of the bipartite graph associates a URL for sponsored advertisement with a corresponding topic word. Next, the system generates a first collapsed graph from the bipartite graph by collapsing the vertices associated with the URLs for sponsored advertisement into a number of edges between topic words. The first collapsed graph includes an edge between two topic words if the bipartite graph has edges between the corresponding topic words and an advertisement URL. Then, the system uses the first collapsed graph to cluster topic words into groups that are joined by at least a predetermined number of edges.
In a further variation, the system clusters the URLs associated with sponsored advertisement into groups by generating a second collapsed graph from the bipartite graph. In doing so, the system first collapses the vertices associated with the topic words into a number of edges between URLs associated with sponsored advertisement. The second collapsed graph includes an edge between two URLs associated with sponsored advertisement if the bipartite graph has edges between the corresponding URLs associated with sponsored advertisement and a topic word. Then, the system uses the second collapsed graph to cluster the URLs associated with sponsored advertisement into groups that are joined by at least a predetermined number of edges, and produces a result which indicates the clusters of URLs associated with sponsored advertisement.
In a variation on this embodiment, the system determines a correlation between two topic words based on their corresponding URLs associated with sponsored advertisement by using a similarity metric computation to rank pairs of topic words, and clustering the topic words into groups whose similarity metric computation is greater than or equal to a predetermined value.
In a further variation, the similarity metric is computed based on a Jaccard index or a cosine similarity.
In a variation on this embodiment, the system determines the correlation between two topic words based on their corresponding URLs associated with sponsored advertisement by gathering location-based information associated with a respective topic word from the URLs associated with sponsored advertisement, and clustering the topic words into groups that have location-based information in common.
In a further variation, gathering location-based information from the URLs associated with sponsored advertisement involves searching the title of the URL for location-based information associated with the topic word.
In a further variation, gathering location-based information from the URLs associated with sponsored advertisement involves searching for location-based information in a web page referenced by the URL associated with the topic word.
In a further variation, the location-based information includes one or more of: a neighborhood, city, county, state, and a country.
In a further variation, the location-based information includes neighboring cities and neighboring points of interest related to the topic word.
In a variation on this embodiment, a respective topic word corresponds to a product name, and a respective group of topic words is associated with a group of related products.
In a variation on this embodiment, the system filters the set of URLs associated with sponsored advertisement to remove URLs produced by known aggregators.
TABLE 1 illustrates a number of brands for a set of categories, and an average number of sponsored hyperlinks per brand for these categories, in accordance with an embodiment of the present invention.
TABLE 2 illustrates exemplary clothing recommendations in accordance with an embodiment of the present invention.
TABLE 3 illustrates exemplary baby equipment recommendations in accordance with an embodiment of the present invention.
TABLE 4 illustrates exemplary shoe recommendations in accordance with an embodiment of the present invention.
TABLE 5 illustrates exemplary luggage recommendations in accordance with an embodiment of the present invention.
TABLE 6 illustrates a comparison of the Amazon.com product recommendations with recommendations generated by an inference detection system in accordance with embodiments of the present invention.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
Furthermore, the methods and processes described below can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
Online advertisements associated with one or more topic words are typically created based on the correlation between the advertisement content and the topic words. For example, a furniture retailer may purchase advertisement space associated with the keywords “leather” and “ottoman” with an Internet search engine because it recognizes a strong correlation between these terms and the interests of its target customers.
Embodiments of the present invention provide a system for mining collective advertiser intelligence to detect inference between terms. This system builds on the observation that if Internet searches based on term A and term B independently result in a given sponsored advertisement hyperlink C, then this correlation is an indication of an inference between A and B. In some embodiments, this system can generate commercial product recommendations for a given retailer by detecting similarities between products. Unlike typical recommender systems, the present system can be completely automated, requires no a priori knowledge of a retailer's brands, and uses publicly available data. Hence, embodiments of the present invention do not require a retailer to maintain and mine a large database of private data.
Advertisers 106-110 can include any node with computational capability and a communication mechanism for communicating with search engines 112-114 through network 104. Advertisers 106-110 can buy a number of advertising spaces or opportunities (e.g., hyperlinks) associated with searches based on topic words from search engines 112-114. In doing so, advertisers 106-110 can associate a set of topic words with a common hyperlink based on the knowledge of the correlation between the topic words and the advertisement content.
Search engines 112-114 can include any node with computational capability and a communication mechanism for communicating with inference detection system 102 and advertisers 106-110 through network 104. In some embodiments of the present invention, search engines 112-114 include Internet search engines. Search engines 112-114 receive from any of advertisers 106-110 a number of topic words associated with a given hyperlink which links to an online advertisement. Search engines 112-114 use this relationship information to present a set of sponsored hyperlinks when displaying search results corresponding to the search topic word.
Inference detection system 102 can include any node with computational capability and a communication mechanism for communicating with search engines 112-114 through network 104. Inference detection system 102 detects inference between topic words by performing a search query for a respective topic word on one or more of search engines 112-114 to retrieve a corresponding collection of sponsored hyperlinks which are presented with the search results. Inference detection system 102 then uses the mapping between topic words and search query results to group these topic words into clusters or to group the sponsored hyperlinks into clusters.
At one end of search engine 206, advertisers 202-204 purchase from a search engine 206 the advertising opportunity associated with a number of topic words 214 to present hyperlink 216. In response, search engine 206 registers hyperlink 216 as a sponsored hyperlink 220 which can appear in a search query's result when the search query includes any one of the associated topic words. In essence, advertisers 202-204 provide valuable relationship information for topic words 214 by associating topic words 214 with sponsored hyperlink 220. Note that sponsored hyperlink 220 is associated with a Uniform Resource Locator (URL) that is displayed with a respective online advertisement, and not necessarily with the actual link produced for the online advertisement.
At the opposing end of search engine 206, inference detection system 208 infers relationships between topic words 218 by submitting a search query on a respective search engine 206 for a given topic word 218, and retrieving a corresponding collection of sponsored hyperlinks 220. First, receiving mechanism 222 receives a number of topic words 218. In some embodiments, receiving mechanism 222 can receive topic words 218 from a file and/or a user. Search engine interface 224 performs a search query on a respective topic word 218 using a search engine 206, and gathers a set of sponsored hyperlinks from the search results. Analysis mechanism 226 groups these topic words 218 into topic-word clusters 210 by matching topic words 218 with sponsored hyperlinks 220. In some variations, analysis mechanism 226 groups sponsored hyperlinks 220 into clusters 212 by matching sponsored hyperlinks 220 with topic words 218.
In some embodiments of the present invention, analysis mechanism 226 generates topic word clusters 210 and sponsored hyperlink clusters 212 by performing a graph-based analysis of the relationships between topic words 218 and sponsored hyperlinks 220. This graph-based analysis is illustrated in
In some embodiments of the present invention, the analysis mechanism performs inference detection on a set of topic words and sponsored hyperlinks by analyzing a directed relationship graph.
Inferences are detected between topic words by analyzing the associations between these topic words based on the sponsored hyperlinks that they have in common. Advertisers may purchase topic words associated with the brands they sell (e.g. the advertiser automall.com buying the keywords “Kia”, “Toyota”, “Honda”, etc.), or the brands of competitors (for example, on May 7, 2008, the search term “Kia” returned a sponsored hyperlink for Toyota (www.ToyotaRetail.com) on Yahoo!).
The example illustrated in
In some embodiments, the relationships between topic words and sponsored hyperlinks are leveraged to make recommendations of brands and/or products. The term “brand” refers to a name that is associated with a collection of goods (e.g. “Toyota”), and the term “product” refers to a specific good (e.g. “Toyota Prius”). For example, TABLE 6 lists products, whereas, TABLES 2-5 list brands. Recommendations are calculated by associating a set of sponsored hyperlinks with each brand (product). For a given brand (product), the other brands (products) are ranked using a similarity measure, thus generating a ranked set of recommendations.
In variations of these embodiments, a similarity metric is used to rank brands (products). A similarity metric is a function that estimates the similarity between two items. In some of these variations, the Jaccard similarity measure (also known as the Jaccard index) is used to rank brands (products). The Jaccard similarity measure calculates the similarities between two sets, A and B, which are subsets of a set C. The Jaccard index of A and B is the ratio of the number of elements that sets A and B have in common, to the number of elements in at least one of A or B:
In other variations, brands (products) are ranked based on a cosine similarity, or a Hamming distance. Details of Jaccard index and cosine similarity can be found at http://en.wikipedia.org/wiki/Jaccard_index, which is incorporated by reference herein.
The following paragraphs present an exemplary application of the inference detection system. This example performs inference detection across a number of brands that span four categories: clothing, baby equipment, shoes, and luggage.
TABLE 1 illustrates a number of brands for a set of categories, and an average number of sponsored hyperlinks per brand for these categories, in accordance with an embodiment of the present invention. For this example, a collection of 350 brands was gathered from online department stores like Nordstrom and Amazon, and includes baby equipment (i.e. cribs, strollers, etc.), shoes, clothes and luggage. To define a ground truth, the brands are grouped into the category that appears to be most closely associated with the brand according to the results of search engine queries on the brands. Most of the brands naturally fall into a single category (e.g. “Graco” in baby equipment and “Ameribag” in luggage). However, there are some cases of brands that branched into multiple categories, for which a category is selected based on the brand's main product line.
For each brand, a single Internet search query is issued on the brand name, and the resulting sponsored hyperlinks are gathered. Each Internet search query is made through a URL formatted to be unlinked to any user account to prevent sponsored hyperlinks from being overtly biased by user history. For example, a URL for the “Graco” search query on Google is: <http://www.google.com/search?q=graco>.
Next, a filtering operation is performed on the collection of gathered sponsored hyperlinks to excluded known aggregators and department stores (e.g. Shopzilla.com, Shopping.com, Jcpenney.com, Amazon.com, Nordstrom.com). It is important to remove aggregators from the collection of sponsored hyperlinks because they could cause a similarity to be detected between unrelated brands.
The brand names are then associated with the complete URLs of the gathered sponsored hyperlinks, and a similarity metric is calculated to measure inference between two respective brands. In some embodiments of the present invention, the Jaccard index is used as the similarity metric. In other embodiments, inference is measured based on a cosine similarity, or based on a Hamming distance. Using the Jaccard index can be advantageous because a wide variety of brands create a very sparse similarity matrix for which more intuitive measures of similarity (e.g. Hamming distance) can generate results that are less easy to interpret. The Jaccard index, on the other hand, provides a similarity index for any two respective brands. Therefore, to obtain product recommendations around a given brand, the Jaccard index can be used to compute an inference between the given brand and the other brands that it shares sponsored hyperlinks with. After the Jaccard index is computed around a given brand, the recommendations are ranked by their Jaccard index.
The resulting sponsored hyperlinks from this example are sparse, with an average of 0.05 shared sponsored hyperlinks between different products. The average number of sponsored hyperlinks per brand is 3.65 with a maximum of 10 sponsored hyperlinks for the brand, Mountain Buggy. The number of shared sponsored hyperlinks is only slightly higher within a category, indicating that there is not a large overlap among the keywords purchased by advertisers in a given category.
In some embodiments of the present invention, the system performs category matching to evaluate the usefulness of a product recommendation. That is, if a recommendation falls into the same category as the initial brand then it is a good recommendation in some broad sense. Out of the top product recommendations for each brand with a Jaccard index of at least 0.125 (which yields 8 recommendations per product on average), 85% fell in the correct category. For the product recommendations with a Jaccard index of at least 0.2 (which yields 2-3 recommendations per product on average), more than 96% are correctly categorized.
The following paragraphs illustrate information for some of the results obtained from the example. These results span four product categories, namely: clothing, baby equipment, shoes, and luggage.
TABLE 2 illustrates exemplary clothing recommendations in accordance with an embodiment of the present invention. For the 171 clothing brands, the average number of sponsored hyperlinks is 3.2, and the average number of shared hyperlinks between distinct brands is 0.07. The most common incorrect recommendations occurred for luggage brands that include a large line of handbags (e.g. Elliott & Lucca). The maximum number of shared sponsored hyperlinks between distinct clothing brands is 4 (e.g Marc Jacobs and Moschino), but there are only 4 such pairs of brands with such a large overlap. This category had some strongly dominant advertisers. For example, www.designerapparels.com purchased 33 distinct clothing brand names.
TABLE 3 illustrates exemplary baby equipment recommendations in accordance with an embodiment of the present invention. For the 99 baby equipment brands, the average number of sponsored hyperlinks is 3.56, and the average number of shared hyperlinks between distinct brands is 0.07. Miscategorized recommendations are rare for baby equipment, possibly because brands in this group are unlikely to offer products in the other categories. Furthermore, some popular brands were apparent. For example, the brand Zooper had 7 sponsored hyperlinks, and the inference detection system used those hyperlinks to detect similarity with 23 other brands. BabiesRUs.com was the largest advertiser in this category, with ads hyperlinked to 15 different brands in the experimental set of 350 brands.
TABLE 4 illustrates exemplary shoe recommendations in accordance with an embodiment of the present invention. For the 67 shoe brands, the average number of sponsored hyperlinks is 5.12, and the average number of shared hyperlinks between distinct brands is 0.3. The most common miscategorized recommendations are for clothing and luggage brands, due to the large number of cross-over brands in each category (e.g. bag makers who also produce shoes). This category had the largest number of purchased ads by a single advertiser, as www.zappos.com purchased 39 of the brand names in the collection.
TABLE 5 illustrates exemplary luggage recommendations in accordance with an embodiment of the present invention. For the 13 luggage brands, the average number of sponsored hyperlinks was 2.9, and the average number of shared hyperlinks between distinct brands was 0.07. Due to the small sample size, additional useful statistics could not be inferred.
TABLE 6 illustrates a comparison of the Amazon.com product recommendations on May 13, 2008 with recommendations generated by the inference detection system of embodiments of the present invention. Results from the inference detection system indicate that even a large online retailer like Amazon.com may benefit from harvesting collective advertiser intelligence. Amazon provides its customers with product recommendations under a heading, “Customers who viewed this item also viewed.” However, for the stroller “BOB Revolution Duallie,” Amazon.com does not provide its customers with a wide variety of competing brands of strollers as most are for other BOB strollers. In particular, of the top 11 Amazon.com recommendations, 7 are for BOB strollers. In contrast, the top 11 recommendations using the inference detection system include 7 new (i.e. non-BOB) stroller products. Results from the inference detection system include a wider variety of stroller brands and may be more useful to a consumer, as the consumer is already aware of BOB strollers.
In some embodiments of the present invention, the analysis mechanism of embodiments of the present invention performs inference detection on a set of topic words and sponsored hyperlinks by analyzing a bipartite graph.
To perform an inference analysis, the inference detection system can collapse a bipartite graph 310 into two types of collapsed graphs. A first collapsed graph depicts the relationship between topic words, and a second collapsed graph depicts the relationship between sponsored hyperlinks. These two collapsed graphs are illustrated under
In some embodiments of the present invention, the inference detection system uses collapsed graph 320 to determine whether two respective topic words are related. In other embodiments, the inference detection system uses the Jaccard index to determine whether two respective topic words are related.
A relationship exists between two topic words when a strong link is detected between the two topic words. In some embodiments, a strong link is said to exist between two topic words when at least a predetermined number of edges couple their corresponding vertices in collapsed graph 320. In other embodiments, a strong link is said to exist between two topic words when their computed Jaccard index is greater than a predetermined value. In other words, a strong link exists between two topic words when the two topic words are categorized into a group by at least a predetermined number of advertisers (i.e., sponsored hyperlinks). For example, collapsed graph 320 includes a strong link 322 formed by two edges, which results from two sponsored hyperlinks (i.e., SL2 and SL3) associating topic words W1 and W2 into the same group.
If less than a predetermined number of links exist to couple the two corresponding vertices in collapsed graph 320, then a weak link is present. For example, collapsed graph 320 includes a weak link 324 formed by a single edge, which results from a single sponsored hyperlink (i.e., SL4) associating topic words W2 and W3 into the same group.
The inference detection system uses strong links in collapsed graph 320 to group key words into clusters. A cluster of topic words is formed by including a number of topic words that are joined together by a set of strong links. For example, cluster 326 is formed based on a strong link that groups topic words W1 and W2 together, and cluster 328 is formed based on a strong link that groups topic words W3 and W4 together.
In some embodiments, the inference detection system only considers sponsored hyperlinks that are representative of a category subject when grouping the key words into clusters. In a variation of these embodiments, the inference detection system performs a filtering process that removes sponsored hyperlinks which correspond to known aggregators. An aggregator is a sponsored hyperlink associated with a set of topic groups which span two or more separate categories.
Any clustering algorithm can be used to partition collapsed graph 320 into a set of clusters. For example, in some embodiments, the system can perform a bottom-up clustering operation which recursively aggregates topic words into clusters. In other embodiments, the system can perform a top-down clustering operation which recursively partitions collapsed graph 320 across weak links to form a number of focused clusters.
The inference detection system uses collapsed graph 330 to determine whether two respective sponsored hyperlinks are related. A relationship exists between two sponsored hyperlinks when a strong link is detected between the two sponsored hyperlinks. In some embodiments, a strong link is said to exist between two sponsored hyperlinks when at least a predetermined number of edges couple their corresponding vertices in collapsed graph 330. In other embodiments, a strong link is said to exist between two sponsored hyperlinks when a computed Jaccard index between the two sponsored hyperlinks is greater than a predetermined value. In other words, a strong link exists between two sponsored hyperlinks when the two sponsored hyperlinks are associated with at least a predetermined number of common topic words. For example, collapsed graph 330 includes a strong link 332 formed by two edges, which results from two topic words (i.e., W3 and W4) associating sponsored hyperlinks SL4 and SL5 into the same group.
If less than a predetermined number of links exist to couple the two corresponding vertices in collapsed graph 330, then a weak link is present. For example, collapsed graph 330 includes a weak link 334 formed by a single edge, which results from a single topic word (i.e., W1) associating sponsored hyperlinks SL1 and SL2 into the same group.
The inference detection system uses strong links in collapsed graph 330 to group sponsored hyperlinks into clusters. A cluster of sponsored hyperlinks is formed by including a number of sponsored hyperlinks that are joined together by a set of topic words. For example, cluster 336 is formed based on a strong link that groups sponsored hyperlinks SL4 and SL5 together, and cluster 338 is formed based on a strong link that groups sponsored hyperlinks SL2 and SL3 together.
Any clustering algorithm can be used to partition collapsed graph 330 into a set of clusters. For example, in some embodiments, the system can perform a bottom-up clustering operation which recursively aggregates sponsored hyperlinks into clusters. In other embodiments, the system can perform a top-down clustering operation which recursively partitions collapsed graph 330 across weak links to form a number of focused clusters.
The system then uses the bipartite graph to generate a collapsed graph by collapsing the sponsored-hyperlink vertices of the bipartite graph (operation 410). Once the collapsed graph is generated, the system determines the strong links in the collapsed graph (operation 412), and gathers topic words joined by strong links into a respective cluster (operation 414). Finally, the system returns the clusters of topic words (operation 416).
In some embodiments, a topic word corresponds to a product name, and a given cluster is associated with a group of related products. In some variations of these embodiments, a product name can include a brand name. The term “brand” refers to a name that is associated with a collection of goods, and the term “product” refers to a specific good.
In these embodiments, the inference detection system can leverage sponsored hyperlinks to establish relationships between products, or to establish relationships between advertisers. For example, inference detection can be applied to a set of topic words associated with a set of products, where the inference detection system creates a product-based bipartite graph that depicts the relationships between a set of products and a corresponding set of advertisers.
Further inference analysis on this bipartite graph can reveal a collection of product clusters, where a respective product cluster depicts a collection of products from related markets, or a collection of products from a common manufacturer. For example, a cluster for the set of product names {“Mountain Buggy”, “Kool Stop”, “Pepenny”} can be created based on sponsored hyperlinks associated with the baby product industry, such as “strollers.com.”
Furthermore, inference analysis on the product-based bipartite graph can reveal a collection of advertiser clusters, where a respective advertiser cluster depicts a collection of corporations that are related by industry. For example, a cluster for the set of sponsored hyperlinks {“zappos.com”, “shoemall.com”, “piperlime.com”} can be created based on the set of product names from the shoe industry {“Asics”, “Naturalizer”, “Skechers”}. This inference detection clearly informs that zappos.com, shoemall.com, and piperlime.com are competing vendors in the shoe industry.
In some embodiments of the present invention, the inference detection system gathers location-based information associated with a given topic word. In these embodiments, the inference detection system draws an inference between location-based information and a topic word based on the corresponding sponsored hyperlinks from the search query results. For example, inference detection can be applied to a set of topic words associated with a city, a business, a landmark, or any topic associated with a geographical location. The inference detection system can form a bipartite graph that depicts the relationships between a set of topic words and a corresponding set of geographical locations that are identified by the sponsored hyperlink.
In one variation of these embodiments, the inference detection system searches the title of a respective sponsored hyperlink for location-based information associated with the given topic word. In another variation of these embodiments, the system searches for location-based information in a web page referenced by a respective sponsored hyperlink.
Further inference analysis on this bipartite graph can reveal a collection of topic-word clusters, where a respective topic-word cluster includes a collection of topics that are in close proximity to one another. For example, a cluster for the set of topic words {“Golden Gate”, “Alcatraz”} can be created based on the set of identified geographical information they have in common {“San Francisco”, “Bay Area”, “California”}.
Furthermore, inference analysis on the location-based bipartite graph can reveal a collection of geographical location clusters, where a respective geographical location cluster includes a collection of locations that share a common theme or business. For example, a cluster for the set of geographical locations {“California”, “Orlando”, “Florida”} can be created based on the set of topic words {“Disney”, “Theme Park”}. This inference detection clearly informs that California and Florida are two locations for Disney theme parks.
In some embodiments, the location-based information includes one or more of: a neighborhood, city, county, state, and a country. Furthermore, in some embodiments, the location-based information includes neighboring cities and neighboring points of interest related to the given topic word.
Storage device 608 stores an operating system 616, an inference detection system 618, topic words 622, sponsored hyperlinks 624, bipartite graphs 626, collapsed graphs 628, and clusters 630. In one embodiment, inference detection system 618 includes a graphical user interface (GUI) module 620.
During operation, inference detection system 618 is loaded from storage device 608 into memory 606 and executed by processor 604. Inference detection system 618 takes topic words 622 as input, and retrieves sponsored hyperlinks 624 from search engine 614 by performing a search query for topic words 622. Inference detection system 618 uses the relationships between topic words 622 and sponsored hyperlinks 624 to generate bipartite graphs 626 and collapsed graphs 628. Inference detection system 618 uses collapsed graphs 628 to infer relationships between topic words, and to infer relationships between sponsored hyperlinks, and creates clusters 630 which denote the groups of related topic words and the groups of related sponsored hyperlinks.
The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.