The present invention relates to the field of computer science. More particularly, the present invention relates to discovering relevant concepts and context for content nodes to determine a user's intent, and using this information to provide targeted advertisement and content.
Information retrieval systems are typically designed to retrieve relevant content from a data repository, based on inputs from users. The user input can be in any of the following example forms: (i) a set of keywords, (ii) single or multiple lists of URLs and domains, and (iii) a set of documents (e.g., text files, HTML pages, or other types of markup language content). A goal of such information retrieval systems is to pull the most relevant content (i.e., most relevant to the given input) from the underlying repository, which might itself consist of a heterogeneous set of structured and unstructured content. An example of the aforementioned information retrieval system is a traditional search engine, where a user provides a set of keywords, and the search engine provides simple ranked lists of top relevant web pages, and a separate list of top relevant paid listings or sponsored links. The set of web pages matching user's search queries and the advertisement database containing sponsored advertising materials are currently two separate databases that are processed very differently to pull the relevant pages and the sponsored links for the same user query. Thus, the conventional search engine described above provides an example of two distinct information repositories being processed in response to the same query.
Current systems find important keywords of a web page then try to expand them using various resources. This expanded set of keywords is compared with a user-provided set of keywords. One problem with such an approach is that keywords can have different meanings. For example, “Chihuahua” is a dog breed, but it is also a province in Mexico. In current systems, Chihuahua may expand to:
A person interested in a Chihuahua dog would find information about the Chihuahua province or travel to it less useful. And a person interested in the Chihuahua province would find information about dog training or a Chihuahua dog less useful. Without knowing the context of the user-provided set of keywords, current systems often present search results that are irrelevant to what the user is seeking.
While the aforementioned systems allow for limited targeting of advertisement and content, such systems fail to provide efficient targeted advertisement avenues. Accordingly, a need exists for an improved solution for advertisement targeting.
The content in a content node is expanded into groupings of concepts and phrases, where each such group represents one possible user intention (as implied by the query phrase or keyword). Each such grouping is analyzed to provide relevant content, such as unstructured data like World Wide Web data, categorized data, display advertisements, and paid listings. This more accurately reflects user intentions even for cases where click through information is absent.
A computerized system for finding important keywords on a content node uses its content and other related URLs like domains. The system is capable of clustering and pruning them by projecting such keywords and phrases on a predefined conceptual map. The projection on the conceptual map enables the expansion of the user intention into multiple contexts, and the further identification of content relevant to the original content node.
The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the present invention and, together with the detailed description, serve to explain the principles and implementations of the invention.
In the drawings:
Embodiments of the present invention are described herein in the context of discovering relevant concepts and context for content nodes to determine a user's intent, and using this information to provide targeted advertisement and content. Those of ordinary skill in the art will realize that the following detailed description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the present invention will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the present invention as illustrated in the accompanying drawings. The same reference indicators will be used throughout the drawings and the following detailed description to refer to the same or like parts.
The invention examines content of interest to a user, to determine what concepts are most closely associated with that content. Other content that is closely associated with the same concepts taken in context is more likely be of interest to the user. And other content that has similar words but different concepts is less likely be of interest to the user. The invention uses concept information previously gleaned from an analysis of other web pages to better understand the context of a current web page. Concepts extracted from the current web page that are not related to the current context are pruned. The content known to be of interest to the user may be presented along with other content that is closely associated with the concepts related to the current context, thus increasing the likelihood that the user will find the other content interesting.
For example, suppose a user visits a web page describing the “Chihuahua” province of Mexico. The “Chihuahua” may expand to:
But the current context relates to the “Chihuahua” province, not the Chihuahua dog breed. According to the invention, concepts extracted from the current web page that are not related to the current context are pruned, resulting in only concepts related to the current context: Travel to Chihuahua Travel to Mexico Hotels in Chihuahua Cheap flights The current web page may be presented along with other content (e.g. paid listings or other websites) that is closely associated with these four concepts that are related to the current context, thus increasing the likelihood that the user will find the other content interesting.
In the context of the present invention, the term “content node” refers to one or more groupings of data. Example groupings of data include a web page, a paid listing, a search query, and a text file.
In the context of the present invention, the term “concept” refers to a unit of thought, expressed by a term, letter, or symbol. It may be the mental representation of beings or things, qualities, actions, locations, situations, or relations. A concept may also arise from a combination of other concepts. Example concepts include “diabetes,” “heart disease,” “socialism,” and “global warming.”
In the context of the present invention, the term “concept association map” refers to a representation of concepts, concept metadata, and relationships between the concepts.
According to one embodiment of the present invention, the concept association map 130 is derived from different sources. Example sources include concept relationships found on the World Wide Web, associations derived from users 135 browsing history, advertisers bidding campaigns, taxonomies, and encyclopedias.
Still referring to
a. Global document frequency of n-grams defining a concept. This measure is indicative of the likelihood that a given n-gram will appear in a document that is part of a corpus.
b. Frequency of n-grams in the content node 100.
c. Similarity of the content node 100 to other content nodes for which relevant concept candidates have already been identified.
d. Weight of the node in the concept graph.
According to one embodiment of the present invention, concept candidates 140 are extracted from different input sources associated with a page on the World Wide Web, viz. the body of the HTML page, the title, the meta-data tags, the anchor text of hyperlinks pointing to this page, the anchor text of hyperlinks contained in the page, the publishing history of the page, as well as the same type of input sources for pages related to this one.
According to one embodiment of the present invention, the content to be tagged with concepts is provided directly by the user 135, for example in the form of a text file.
According to one embodiment of the present invention, the content to be tagged is any textual section of a relational database, e.g. a product inventory database.
According to another embodiment of the present invention, the node content is a user query, defined as a set of search keywords.
According to another embodiment of the present invention, the concept candidates 140 are provided by the user 135 as input to the system. For example, in a bidding campaign a content provider or a merchant could provide such a list based on internal knowledge about the products to be advertised.
According to one embodiment of the present invention, for web page, top referral queries on major search engines are also identified as top concepts. For example, if for a URL a.b.com/d, most of the incoming traffic from major search engines is coming from users 135 searching for query “diabetes” and “diabetes symptoms,” these queries are added as top concepts.
According to another embodiment of the present invention, concepts can also get identified from other pages relevant to the page of interest, for example if the relevant page is structurally similar (through hyperlinks) to the page of interest, or if the relevant page is contextually similar (same content) to the page of interest.
Concept candidate extractor 105 is configured to use the aforementioned statistics to extract suitable concept candidates in the content node 100. This is accomplished by matching the concepts available in the concept association map 130 against the text in the content node 100.
Concept filterer 115 is configured to rank the concept candidates 140 based at least in part on a measure of relevance that weighs their frequency in the content node 100, their likelihood of appearing in a document, as well as the likelihood of being selected based on the closeness of this content node 100 to similar concept nodes.
According to another embodiment of the present invention, for the case of structured content (e.g. a web page), different content sections are weighed according to their relative importance. For example, the title of a page is weighted more than the body of the page, relative to its length.
The concept candidates selected and their respective scores are, ranked in order of decreasing relevance:
The mapping of the candidates against the available conceptual map shows that the following concept candidates are associated with high score relative to other concept candidates:
Referring again to
One cost function is based on selecting the neighbors that present the best clustering characteristics, i.e. they are more likely to be strongly associated with each other.
Another cost function is based on selecting neighbors that, based on aggregate user activity history, have a higher likelihood to be associated.
Another cost function is based on looking at the likelihood that such concepts are selected together based on their co-occurrence in a corpus of documents (e.g. the World Wide Web).
Another cost function is based on determining which neighboring concepts in the concept association map are tied to a form of monetization (e.g. online advertisement) that yields the highest conversion rate (measured as a combination of CPC and CTR).
According to one embodiment of the present invention, the nodes in the concept association map are also tagged with labels representing one or more high-level categories.
According to one embodiment of the present invention, a page content classifier 120 is utilized to label the page with a high level category in order to narrow down the mapping to the concept association map 130 to certain pre-defined contexts.
According to one embodiment of the present invention, results on the concept association map 130 are clustered to identify different user's intention.
According to another embodiment of the present invention, the highest-weighted concepts in the graph are chosen as top related concepts. Weight score can be defined using different sources. Examples of weights to be used are structural scores like “betweenness” and “page rank,” monetization values like click through and cost per click and frequency of appearance on the web or user's query logs.
According to another embodiment of the present invention, top concepts and regions of interest are used to map paid listings or other forms of advertisement to the content node as described in
In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.
According to one embodiment of the present invention, the components, process steps, and/or data structures may be implemented using various types of operating systems (OS), computing platforms, firmware, computer programs, computer languages, and/or general-purpose machines. The method can be run as a programmed process running on processing circuitry. The processing circuitry can take the form of numerous combinations of processors and operating systems, connections and networks, data stores, or a stand-alone device. The process can be implemented as instructions executed by such hardware, hardware alone, or any combination thereof. The software may be stored on a program storage device readable by a machine.
While embodiments and applications of this invention have been shown and described, it would be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5721910 | Unger et al. | Feb 1998 | A |
6339767 | Rivette et al. | Jan 2002 | B1 |
6816884 | Summers | Nov 2004 | B1 |
6826553 | DaCosta et al. | Nov 2004 | B1 |
7092953 | Haynes | Aug 2006 | B1 |
7181438 | Szabo | Feb 2007 | B1 |
7574659 | Szabo | Aug 2009 | B2 |
7660855 | Arning et al. | Feb 2010 | B2 |
7680796 | Yeh et al. | Mar 2010 | B2 |
7716060 | Germeraad et al. | May 2010 | B2 |
7725467 | Yamamoto et al. | May 2010 | B2 |
7725475 | Alspector et al. | May 2010 | B1 |
7725525 | Work | May 2010 | B2 |
7730063 | Eder | Jun 2010 | B2 |
7818191 | Lutnick et al. | Oct 2010 | B2 |
7822745 | Fayyad et al. | Oct 2010 | B2 |
7885987 | Lee | Feb 2011 | B1 |
7958120 | Muntz et al. | Jun 2011 | B2 |
7984029 | Alspector et al. | Jul 2011 | B2 |
7996753 | Chan et al. | Aug 2011 | B1 |
8010527 | Denoue | Aug 2011 | B2 |
8024372 | Harik et al. | Sep 2011 | B2 |
8050965 | Hellevik et al. | Nov 2011 | B2 |
8103659 | Spiegel | Jan 2012 | B1 |
8301617 | Muntz et al. | Oct 2012 | B2 |
8370362 | Szabo | Feb 2013 | B2 |
8380721 | Rezaei et al. | Feb 2013 | B2 |
8412575 | Labio et al. | Apr 2013 | B2 |
8417695 | Zhong et al. | Apr 2013 | B2 |
8468118 | Kim et al. | Jun 2013 | B2 |
8799302 | Singerman | Aug 2014 | B2 |
8825654 | Muntz et al. | Sep 2014 | B2 |
8825657 | Rezaei et al. | Sep 2014 | B2 |
8838605 | Muntz et al. | Sep 2014 | B2 |
8843434 | Rezaei et al. | Sep 2014 | B2 |
9378281 | Melton | Jun 2016 | B2 |
10423668 | Leban | Sep 2019 | B2 |
20010037324 | Agrawal et al. | Nov 2001 | A1 |
20020080180 | Mander et al. | Jun 2002 | A1 |
20020091846 | Garcia-Luna-Aceves et al. | Jul 2002 | A1 |
20020143742 | Nonomura et al. | Oct 2002 | A1 |
20030046307 | Rivette et al. | Mar 2003 | A1 |
20030115191 | Copperman | Jun 2003 | A1 |
20030187881 | Murata et al. | Oct 2003 | A1 |
20030227479 | Mizrahi et al. | Dec 2003 | A1 |
20040059736 | Willse | Mar 2004 | A1 |
20040080524 | Yeh et al. | Apr 2004 | A1 |
20040122803 | Dom et al. | Jun 2004 | A1 |
20040133555 | Toong et al. | Jul 2004 | A1 |
20040170328 | Ladwig et al. | Sep 2004 | A1 |
20040267638 | Giunta | Dec 2004 | A1 |
20050010556 | Phelan | Jan 2005 | A1 |
20050033742 | Kamvar et al. | Feb 2005 | A1 |
20050065980 | Hyatt et al. | Mar 2005 | A1 |
20050114198 | Koningstein et al. | May 2005 | A1 |
20050114763 | Nonomura et al. | May 2005 | A1 |
20050138070 | Huberman et al. | Jun 2005 | A1 |
20050182755 | Tran | Aug 2005 | A1 |
20050210008 | Tran et al. | Sep 2005 | A1 |
20050256905 | Gruhl et al. | Nov 2005 | A1 |
20050256949 | Gruhl et al. | Nov 2005 | A1 |
20050283461 | Sell et al. | Dec 2005 | A1 |
20060004703 | Spivack et al. | Jan 2006 | A1 |
20060036619 | Fuerst | Feb 2006 | A1 |
20060041548 | Parsons et al. | Feb 2006 | A1 |
20060080422 | Huberman et al. | Apr 2006 | A1 |
20060106793 | Liang | May 2006 | A1 |
20060106847 | Eckardt et al. | May 2006 | A1 |
20060112105 | Adamic et al. | May 2006 | A1 |
20060112111 | Tseng et al. | May 2006 | A1 |
20060171331 | Previdi et al. | Aug 2006 | A1 |
20060184464 | Tseng et al. | Aug 2006 | A1 |
20060209727 | Jennings et al. | Sep 2006 | A1 |
20060212350 | Ellis et al. | Sep 2006 | A1 |
20060218035 | Park et al. | Sep 2006 | A1 |
20060235841 | Betz et al. | Oct 2006 | A1 |
20070038614 | Guha | Feb 2007 | A1 |
20070112597 | Heckerman | May 2007 | A1 |
20070143329 | Vigen | Jun 2007 | A1 |
20070239534 | Liu et al. | Oct 2007 | A1 |
20080086592 | Stephani | Apr 2008 | A1 |
20080140491 | Jain et al. | Jun 2008 | A1 |
20100070448 | Omoigui | Mar 2010 | A1 |
20110276563 | Sandoval et al. | Nov 2011 | A1 |
20130046797 | Muntz et al. | Feb 2013 | A1 |
20130046842 | Muntz et al. | Feb 2013 | A1 |
20130073546 | Yan et al. | Mar 2013 | A1 |
20130198191 | Hernández et al. | Aug 2013 | A1 |
20140040184 | Benissan | Feb 2014 | A1 |
20140067535 | Rezaei et al. | Mar 2014 | A1 |
20140351237 | Rezaei et al. | Nov 2014 | A1 |
20150262255 | Khajehnouri et al. | Sep 2015 | A1 |
Number | Date | Country |
---|---|---|
101278257 | Oct 2008 | CN |
1891509 | Feb 2008 | EP |
2007084616 | Jul 2007 | WO |
2007084778 | Jul 2007 | WO |
2006121575 | Dec 2007 | WO |
2007100923 | Oct 2008 | WO |
Entry |
---|
Why We Buy, authored by Paco Underhill (Year: 1999). |
Office Action in U.S. Appl. No. 11/923,546, dated Aug. 27, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 12/130,171, filed Aug. 1, 2011. |
Amendment and Response to Office Action in U.S. Appl. No. 11/624,674, dated Oct. 21, 2011. |
Amendment and Response to Office Action in U.S. Appl. No. 11/625,279, filed Aug. 12, 2013. |
Amendment and Response to Office Action in U.S. Appl. No. 11/625,279, filed Jan. 3, 2014. |
Amendment and Response to Office Action in U.S. Appl. No. 11/625,279, filed Mar. 16, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 11/625,279, filed Oct. 17, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 11/680,599, filed Aug. 23, 2010. |
Amendment and Response to Office Action in U.S. Appl. No. 11/680,599, filed Aug. 3, 2011. |
Amendment and Response to Office Action in U.S. Appl. No. 11/680,599, filed Feb. 29, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 11/680,599, filed Oct. 29, 2009. |
Amendment and Response to Office Action in U.S. Appl. No. 11/680,599, filed Oct. 30, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 11/923,546, filed Nov. 6, 2013. |
Amendment and Response to Office Action in U.S. Appl. No. 12/130,171, filed Apr. 19, 2012. |
Amendment and Response to Office Action in U.S. Appl. No. 13/660,940, filed Apr. 7, 2014. |
Amendment and Response to Office Action in U.S. Appl. No. 13/660,955, filed Apr. 7, 2014. |
Calado, P., et al., “Combining Link-Based and Content-Based Methods for Web Document Classifications,” CIKM'03, pp. 394-401, ACM, 2003. |
Chao, I. et al., “Tag Mechanisms Evaluated for Coordination in Open Multi-Agent Systems,” ESAW 2007, LNAI 4995, Springer-Verlag Berlin Heidelberg, 2008, pp. 254-269. |
Franceschetti et. al., “Closing the Gap in the Capacity of Wireless Networks via Percolation Theory,” IEEE Trans. On Information Theory, vol. 53, No. 3, Mar. 2007, pp. 1009-1018. |
Jiang, et al., “Monotone Percolation and the Topology Control of Wireless Networks”, California Institute of Technology, Electrical Engineering Dept, 0-7803-8968-9/05, 2005, pp. 327-338. |
Kini et. al., “Fast and efficient randomized flooding on lattice sensor networks”, Nov. 19, 2004, Drexel University, pp. 1-33. |
Kong et al., “Collaborative Spam Filtering Using E-Mail Networks”, Aug. 2006, IEEE, pp. 67-73. |
Newman, M.E.J. et al., “Scaling and percolation in the small-world network model”, Sante Fe Institute, May 6, 1999, pp. 1-12. |
Newman, M.E.J., “Random Graphs as Models of Networks”, SFI Working Paper: 2002-02-005, 2002, pp. 1-36. |
Notice of Allowance in U.S. Appl. No. 11/624,674, dated Oct. 12, 2012. |
Notice of Allowance in U.S. Appl. No. 11/625,279, dated Apr. 16, 2014. |
Notice of Allowance in U.S. Appl. No. 11/680,599, dated Apr. 30, 2014. |
Notice of Allowance in U.S. Appl. No. 13/098,870, dated Jun. 28, 2012. |
Notice of Allowance in U.S. Appl. No. 13/660,940, dated Jun. 13, 2014. |
Notice of Allowance in U.S. Appl. No. 14/457,693, dated May 12, 2016. |
Notice of Allowance in U.S. Appl. No. 13/660,955, dated Jun. 16, 2014. |
Office Action in U.S. Appl. No. 11/624,674, dated Dec. 20, 2011. |
Office Action in U.S. Appl. No. 11/625,279, dated Apr. 17, 2012. |
Office Action in U.S. Appl. No. 11/625,279, dated Feb. 12, 2013. |
Office Action in U.S. Appl. No. 11/625,279, dated Oct. 3, 2013. |
Office Action in U.S. Appl. No. 11/625,279, dated Sep. 16, 2011. |
Office Action in U.S. Appl. No. 11/680,599, dated Apr. 30, 2012. |
Office Action in U.S. Appl. No. 11/680,599, dated Aug. 29, 2011. |
Office Action in U.S. Appl. No. 11/680,599, dated Jul. 3, 2013. |
Office Action in U.S. Appl. No. 11/923,546, dated Aug. 14, 2014. |
Office Action in U.S. Appl. No. 11/923,546, dated Jan. 25, 2016. |
Office Action in U.S. Appl. No. 11/923,546, dated Jun. 6, 2013. |
Office Action in U.S. Appl. No. 11/923,546, dated Jun. 9, 2015. |
Office Action in U.S. Appl. No. 12/130,171, dated Oct. 19, 2011. |
Office Action in U.S. Appl. No. 13/660,940, dated Jan. 6, 2014. |
Office Action in U.S. Appl. No. 14/457,693, dated Jan. 21, 2016. |
Office Action in U.S. Appl. No. 13/660,955, dated Jan. 8, 2014. |
Page, L. et al., “The Page Rank Citation Ranking: Bringing Order to the Web,” Technical Report, Stanford InfoLab, 1998, pp. 1-17. |
Silverberg et al., “A Percolation Model of Innovation in Comples Technology Spaces”, Sep. 2002, Merit—Infornomics Research Memorandum Series, pp. 1-24, Sep. 2002. |
Wang, G., “Web Search with Personalization and Knowledge”, 2002 Proceedings of the IEEE Fourth International Symposium on Multimedia Software Engineering, 2002. |
Weikum et al., “Towards Self-Organizing Query Routing and Processing for Peer-to-Peer Web Search”, DELIS-TR-0287, 2005, 19 pages. |
Zou et al., “Email Virus Propagation Modeling and Analysis”, Univ. of Mass., Dept. of Electrical and computer Engineering, Dept. of Computer Science, 2004, TR-CSE-03-04, pp. 1-17. |
Number | Date | Country | |
---|---|---|---|
20200051093 A1 | Feb 2020 | US |
Number | Date | Country | |
---|---|---|---|
61050958 | May 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12436748 | May 2009 | US |
Child | 16545689 | US |