A software and/or hardware facility is described that identifies and characterizes web feeds and web feed content for purposes of populating a resource with targeted content. The facility identifies web feeds and characterizes the web feeds and/or content in the web feeds using attributes from an attribute taxonomy. The characterization may be based on metadata associated with the web feed and/or on content in the web feed. Once the web feeds and/or content of the web feeds have been characterized, the web feeds and content may be appropriately selected for delivery to users and/or services in a targeted manner. Specifically, when a user or a service requests a specific resource, web feeds and content from web feeds may be selected for inclusion in the resource. The resource may be a web page that contains content or search results, a document containing content for a user, an XML document responsive to a service request, a metafeed (as defined below), or any other resource that would benefit from targeted content derived from web feeds. The web feeds and content that are selected may be based on characteristics of the requesting user or service (such as the identity or location of the user or service), characteristics of the resource (such as the Uniform Resource Indicator (URI) or general subject matter of the resource), or any other characteristic that is known by the facility that may be used to better target the populated content to the requesting user or service.
In some embodiments, the facility relies upon an attribute taxonomy that is partitioned in a variety of ways. One set of attributes may be a geographical classification system that may be used to characterize web feeds or web feed content based upon a geography (e.g., city, state, region, latitude and longitude) associated with that feed or content. A second set of attributes may be a topical classification system, which may be used to characterize web feeds or content based upon a topical classification (e.g., news, sports, product reviews) associated with that feed or content. Additional sets of attributes may be defined by the facility depending on the identity of the web feed and content, as well as the type of resource that is to be populated. Appropriate attributes are selected by the facility and associated with web feeds and feed content in order to characterize the web feeds and feed content. By utilizing an attribute taxonomy that has been partitioned in a variety of ways, the facility can provide highly targeted web feed and web feed content to requesting users or services.
In some embodiments, the facility utilizes the characterized web feeds and/or web feed content to populate web pages identified by a uniform resource indicator (URI). Uniform resource indicators, such as “www.seattlekayaks.com,” are used to identify the location of specific content on the web or other network, and will often contain information within the URI that suggests the content that should be associated with that resource. For example, the URI www.seattlekayaks.com suggests both a geography (Seattle) and a topic (kayaks) that might be relevant to the URI. By analyzing the URI, the facility is able to select appropriate web feeds and web feed content that may be preferably associated with a resource accessed via the URI. The facility may use the web feed and web feed content to populate one or more resources that are accessible via the URI or any extensions of the URI (e.g., sales.seattlekayaks.com or seattlekayaks.com/trips). The facility thereby allows resources to be automatically created that are relevant and useful to users requesting the resource associated with the URI.
Various embodiments of the invention will now be described. The following description provides specific details for a thorough understanding and an enabling description of these embodiments. One skilled in the art will understand, however, that the invention may be practiced without many of these details. Additionally, some well-known structures or functions may not be shown or described in detail, so as to avoid unnecessarily obscuring the relevant description of the various embodiments. The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific embodiments of the invention.
Once a web feed has been identified by the feed acquisition component 110, it is analyzed by the feed characterization component 120 and characterized using one or more attributes contained in an attribute taxonomy 180.
At a block 260, the feed characterization component 120 obtains any metadata associated with the web feed that describes the contents of the web feed. Those skilled in the art will appreciate that the metadata associated with the feed can come from a variety of different sources. For example, certain information about the feed may be obtained from the content provider website, the aggregator website or from other sites that characterize the feed. Additionally, web feeds typically contain characterizing data in the feed itself, such as a title, a category, a copyright notice, or a short description of the web feed contents. Finally, the feed characterization component may monitor one or more pieces of content that are provided over the feed in order to identify representative pieces of content that are typically contained in the web feed. All of these sources of metadata may be used by the facility to characterize the general contents of the web feed.
At a block 265, the facility processes the metadata in order to characterize the web feed. Preferably, the feed characterization component 120 analyzes the metadata and associates significant terms or concepts in the metadata with one or more of the attributes from the attribute taxonomy 180. As was previously mentioned, attribute taxonomy 180 contains an organized structure of attributes. In some embodiments, the attribute taxonomy includes various sets of related attributes. One set of attributes may be a geographical classification system that includes geographic terms (e.g., city, state, region, latitude and longitude) that may be associated with a web feed or content. A second set of attributes may be a topical classification system that includes various topic descriptions (e.g., news, sports, product reviews) that may be associated with a web feed or content. A third set of attributes may be a language classification system that includes languages (e.g. English, Spanish, French). A fourth set of attributes may be a format of a web feed or content (e.g., text, image, audio, video). Still other sets of attributes may be defined by the facility operator, depending on the intended use of the facility. The facility may also provide a tool to allow an operator to create, modify and delete attributes and attribute relationships within the attribute taxonomy 180. Analyzing the metadata may encompass identifying significant terms in the metadata, such as terms that occur frequently or that are associated with named locations. The significant terms identified in the metadata are matched against those attributes that are contained in the attribute taxonomy 180. Matches or near matches may indicate that that identified attribute has some association with the corresponding web feed from which the metadata was obtained. Such attributes may therefore be matched with the web feed, and may be derived from one or more of the various attribute sets in the attribute taxonomy. Alternatively, or in addition to the automatic assignment of attributes to a web feed, a manual assignment of attributes to a web feed may be undertaken by an operator of the facility or by users of the facility. For example, the facility may automatically assign attributes to a web feed and an operator may confirm that the attributes have been correctly assigned before the web feed is further utilized by the facility. As another example, the facility may automatically assign attributes to a web feed and use the web feed without prior review. As part of the use of the web feed, however, the facility may provide a feedback mechanism to an end user to allow the end user to indicate to the facility or facility operator that the web feed had been incorrectly characterized. Such a feedback mechanism may range from a button allowing the user to indicate that the web feed appears to be incorrectly characterized, to a menu or form that allows the user to assign attributes to the web feed. By identifying a group of attributes that characterize a web feed from a normalized set of attribute terms, groupings of similar web feeds may be more readily identified. The assignment of attributes therefore allows the facility to accurately select web feeds and web feed content for inclusion in a targeted resource.
Once the feed characterization component 120 identifies one or more attributes that are associated with a web feed, the location of the web feed and the associated attributes are stored by the facility in a feed database 190. At a block 270, the feed location is stored in the feed database. The feed location is a URI or other pointer to a location where the feed may be retrieved. At a block 275, the feed characterization component stores the attributes that were identified by the automatic processing of metadata and/or by the manual characterization by the facility operator.
While the attributes have been represented in textual form in table 300 for benefit of this description, it will be appreciated that an actual implementation of the table may represent the attributes by unique codes, by a search index, or in other form. Moreover, while the attributes in record 340 are all drawn from the same set of topic attributes, other records may have attributes drawn from more than one set of attributes. For example, a representative record 350 includes both topic attributes (e.g., “technology,” “computers”) and geographic attributes (“Silicon Valley”). Other records may include format attributes (e.g., “video”) and other attributes from the attribute sets discussed above. On a periodic basis, the attributes in the table may be updated to reflect changes to the web feed, such as changes to the web feed content, feed publisher, or feed location.
Returning to
In addition to characterizing web feeds based on metadata associated with a web feed, the web feed facility 100 may also receive, characterize, and store specific content from web feeds. As depicted in
Once web feed content has been received by the content acquisition component 130, it is analyzed by the content characterization component 140 and characterized using one or more attributes contained in the attribute taxonomy 180.
At block 420, the facility processes the content to identify corresponding attributes that should be associated with the content. Preferably, the content characterization component 140 analyzes the content and associates significant terms or concepts in the content with one or more of the attributes from the attribute taxonomy 180. As was previously discussed attribute taxonomy 180 contains an organized structure of attributes that may be partitioned into various sets, such as geographic terms, topic descriptions, and other identifiers. Analyzing the content may encompass identifying significant terms in the content, such as terms that occur frequently or that are associated with named locations. The significant terms identified in the content are matched against those attributes that are contained in the attribute taxonomy 180. Matches or near matches may indicate that that identified attribute has some association with the corresponding web feed content. Such attributes may therefore be matched with the web feed content, and may be derived from one or more of the various attribute sets in the attribute taxonomy. Alternatively, or in addition to the automatic assignment of attributes to web feed content, a manual assignment of attributes to web feed content may be undertaken by an operator of the facility or by users of the facility. For example, the facility may automatically assign attributes to web feed content and an operator may confirm that the attributes have been correctly assigned before the web feed content is further utilized by the facility. As another example, the facility may automatically assign attributes to web feed content and use the web feed content without prior review. As part of the use of the web feed content, however, the facility may provide a feedback mechanism to an end user to allow the end user to indicate to the facility or facility operator that the web feed content had been incorrectly characterized. Such a feedback mechanism may range from a button allowing the user to indicate that the web feed content appears to be incorrectly characterized, to a menu or form that allows the user to assign attributes to the web feed content. By identifying a group of attributes that characterize web feed content from a normalized set of attribute terms, groupings of similar content may be more readily identified. The assignment of attributes therefore allows the facility to accurately select web feed content for inclusion in a targeted resource.
Once the content characterization component 140 identifies one or more attributes that are associated with content from a web feed, the source of the content, the content itself, and the associated attributes are stored by the facility in a content database 200. At a block 430, the content characterization component stores an identifier indicating the source of the web feed content. At a block 440, the content characterization component stores a copy of the web feed content in the content database 200. By storing content locally, the facility may be able to use the content in a variety of different ways as will be described below. Depending on the intended use for the content and consistent with use restrictions that a publisher may place on content, the facility may opt to retain only the most recent content that it receives for a particular web feed, or it may opt to retain a significant number of pieces of content received from a web feed. If only the most recent content is to be stored, the previous content that was received by the facility for a particular web feed may be overwritten or otherwise deleted at block 440. If multiple pieces of content from a single web feed are to be stored, the facility may time stamp or otherwise identify the order that the pieces of content are received so that the pieces may be managed and deleted (if desired) in the future. At a block 450, the content characterization component stores the attributes that were identified by the automatic processing of content and/or by the manual characterization by the facility operator.
Returning to
Returning to
In order to identify appropriate feeds or content, the facility includes a search and browse component 210. The search and browse component allows the facility to quickly identify relevant web feeds and feed content in response to a search query, such as to provide a set of search results to a user. The search and browse component may also be used by the facility for purposes of populating a resource as described herein. To improve response times and enable the quick identification of relevant web feeds and web feed content, the search and browse component may rely on the feed index 220 and the content index 230 that are maintained by the facility.
Once web feeds and/or web feed content have been characterized by the facility by assigning attributes to the web feed and/or web feed content, the facility may use the characterizations to populate a resource that is requested by a user 165 or service 175. The facility includes a content targeting component 215 that selects appropriate web feed and/or web feed content for inclusion in resources. The selection is made in a manner that targets the web feed and/or web feed content to the subject matter of the resource, the interests of the user, or the request of the service. By matching web feeds and/or web feed content to the subject of the resource, the interests of the user, or the request of the service, the facility is able to provide the user or service with compelling content that is tailored for that particular user or service.
In a second region 650, the facility identifies one or more web feeds 652 that are also generally related to the subject matter indicated by the resource URI. In the example depicted in
An advertising region 665 is provided on the web page 620 to display advertisements that are relevant to the resource. In the example depicted in
A control region 670 is provided to allow a user to adjust the type and quantity of targeted content that appears in the resource. A first button 675 is provided to allow a user to change the configuration and content of the web page. Such changes may include, but are not limited to, changing the number of web feeds and web feed content on the page, changing the overall layout of the page (e.g., such as by adding or removing regions on the page), or changing the content displayed on the page by changing, prioritizing, or weighting the terms used to predict the subject matter of the page (e.g., in the example above, by indicating that “kayaks” is a more important term than “seattle” in the URI, or by adding an additional term “white water”). Any changes in the configuration and content of the web page may be stored in a profile that is associated with the user. When the user subsequently visits the web page, the user may be identified by various techniques (e.g. using cookies or a log-in screen) and the web page tailored to the user based on the previously-specified charges. A button 680 is provided to allow a user to make the displayed web page the “home page” on their browser. Those skilled in the art will appreciate that other controls may be provided to allow a user to change the content as well as the look and feel of the web page 620. Moreover, the facility operator is also able to configure the look and feel of the web page for certain classes of users depending on the content displayed to users and on other factors.
A keyword region 685 is provided to allow a user to select one of a number of displayed keywords 690 that are related to the contents of the page. Selecting one of the keywords 690 causes the contents of the page to be updated to contain, or the user to be redirected to a different resource that contains, content that is related to the selected keyword. In some embodiments, the keywords that are displayed are the attributes associated with the web feed and the web feed content displayed on the page. The attributes may be ranked in accordance with relevance to the page content or selected for display using a different algorithm.
In addition to populating web feeds and web feed content on a web page associated with a URI, it will be appreciated that the content may also be used in a variety of other applications where resources are to be tailored for a user or service. For example, using the techniques described herein, web feeds and/or web feed content may be selected as search results in response to a search query. As another example, web feeds and web feed content may be used to construct XML documents relating to a particular subject. As still another example, web feeds and web feed content can be bundled together to create “metafeeds.” Metafeeds are feeds that aggregate content from multiple feeds and provide the aggregated content to a user or service. Other uses will be appreciated by those skilled in the art.
After retrieving information characterizing the resource, or identifying that no previously-generated information characterizing the resource exists, processing by the content targeting component proceeds to a decision block 740. At decision block 740, the content targeting component determines whether there is any information characterizing the requestor that may be used to select targeted content for the resource. Information about the requester may include, but is not limited to: (i) categories of interest to the requestor (e.g., politics or sports); (ii) prior viewing behavior of the requestor (e.g., the requestor previously received other resources that shared a common subject matter); (iii) prior search behavior of the requester; (iv) purchasing behavior of the requestor (e.g., the requestor previously purchased a certain brand of mobile phone); (v) demographic information about the requester; (vi) geographic information about the requestor (e.g., information explicitly received from the requestor or information that can be inferred about the requestor, such as the internet address of the requestor which may be used to estimate a physical location); (vii) any information that the requestor may have previously provided to the facility operator; or (viii) any other information about the requestor that may be stored by the facility and associated with a requestor. Those skilled in the art will appreciate that information characterizing the requestor may be associated by the facility with the requestor using a cookie on the users browser or with any other technology that allows stored information to be maintained between resource deliveries to a requestor. If any information characterizing the requestor exists, the facility proceeds to a block 750 where the information is retrieved. Such information characterizing the requestor can take a variety of forms, but typically may be represented, or easily converted into, a series of keywords that are associated with the resource itself or with a profile of the requestor maintained by the facility. The keywords associated with the requestor may change over time, and may be modifiable by the requestor by accessing such functionality as the “configure this page” button 675 as shown in
After any information characterizing the resource or characterizing the requestor has been identified, at block 760 the facility utilizes the information to select one or more web feeds that are deemed to be of interest to the requestor and that may be used to populate the resource. Web feeds are identified by using the keywords that were identified as characterizing the resource or the requestor in blocks 730 and 750 in one or more search queries to identify those web feeds in the feed database 190 that are responsive to the queries. Those skilled in the art will appreciate that a variety of algorithms may be used to weight or otherwise prioritize the keywords in order to identify the most relevant web feeds to the resource or requestor. Moreover, other techniques may be used to correlate keywords associated with the resource or requestor with web feeds that are characterized by attributes in the feed database 190.
At block 770, the facility identifies content from web feeds that may be used to populate the resource. Web feed content may be identified in one of two ways. In a first approach, once web feeds have been identified by the content targeting component at block 760, the feeds may be monitored in order to receive the latest content posted by the feed provider. Such received content would be presumed to be related to the resource or the requester, since the web feed from which the content is received was characterized by the facility as being related to the resource or requestor. In some cases, however, the content in a particular web feed may vary considerably, such as when the web feed is a general news source. In cases where the web feed content may vary, a second approach may be more suitable to locate web feed content that is related to the resource or requestor. In the second approach, web feed content is identified by using the keywords that were identified as characterizing the resource or the requestor in blocks 730 and 750 in one or more search queries to identify those items of content in the content database 200 that are responsive to the queries. The second approach differs from the first approach in that the web feed content is always readily available for use by the content targeting component and the identified web content is more likely to be closely matched with the resource or requestor. Those skilled in the art will appreciate that a variety of algorithms may be used to weight or otherwise prioritize the keywords in order to identify the most relevant web content to the resource or requestor. Moreover, other techniques may be used to correlate keywords associated with the resource or requester with web feed content that is characterized by attributes in the content database 200.
At block 780, the content targeting component 215 populates the resource with the identified web feeds and/or web feed content. As was previously discussed,
While various embodiments are described in terms of the environment described above, those skilled in the art will appreciate that various changes to the facility may be made without departing from the scope of the invention. For example, category database 180, feed database 190, content database 200, feed index 220, and content index 230 are all indicated as being contained in a general data store area 240. Those skilled in the art will appreciate that the actual implementation of the data storage area 240 may take a variety of forms, and the term “database” is used herein in the generic sense to refer to any data structure that allows data to be stored and accessed, such as tables, linked lists, arrays, etc.
Those skilled in the art will also appreciate that the facility may be implemented in a variety of environments including a single, monolithic computer system, a distributed system, as well as various other combinations of computer systems or similar devices connected in various ways. Moreover, the facility may utilize third-party services and data to implement all or portions of the information functionality. Those skilled in the art will further appreciate that the steps shown in
Moreover, while
From the foregoing, it will be appreciated that specific embodiments of the invention have been described herein for purposes of illustration, but that various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.