This disclosure pertains generally to electronic information aggregation, and more specifically to reducing redundant content in the context of a multi-source feed reader.
A multi-source feed reader aggregates content from multiple syndicated feeds. Typically, multi-source feed readers are web based, such that syndicated information such as news, blogs, and/or similar items from multiple feeds are aggregated at a single web site for convenient access. With the popularity of multi-source feed readers such as Google Reader, NetVibes, BlogLines and even Microsoft Outlook, it is apparent that blogs have become an important source of information for many people, especially those who are more technical.
A problem with getting one's news and other information from a multi-source feed reader is that many users subscribe to one more than one feed on a single topic (e.g., both Gizmodo and Engadget on consumer electronics devices, or both Mashable and TechCrunch on Web 2.0 issues). This would not be a problem but for the fact that most blogs and many other web information providers derive their content from the same sources: other blogs, online news sites and mainstream media (newspapers, television, etc.). Typically, a single party breaks a story, and the every other online information provider repeats the story in slightly modified language. This results in multi-source feed readers providing the same information to users multiple times, in slightly different forms.
Note that the problem is not that a multi-source feed reader provides an identical article or blog entry multiple times. Conventional duplicate elimination functionality can eliminate exact duplicates of individual items (i.e., multiple copies of the same article by the same author). The problem is that the multi-source feed reader provides multiple articles, blog entries and the like to a user, each of which is from a different source and is not a word for word copy, but which is based on a common underlying original source (either directly or indirectly), and thus contains essentially redundant information in somewhat different language. As such, the multiple items are cumulative and redundant to each other. Although such an article may at first appear to a user to be new information, upon reading it the user realizes that it is essentially the same as the other articles to which it is cumulative, although its format and exact wording varies.
It would be desirable to address these issues.
A redundancy reducing management system automatically reduces redundant content items provided to a user by a multi-source feed reader. A list of a plurality of feeds to which the user subscribes through the multi-source feed reader is maintained. The multi-source feed reader obtains content items from the feeds to which the user subscribes. Each content item obtained by the multi-source feed reader from each feed of the plurality is received by the redundancy reducing management system prior to being made available to the user. Each received content item is analyzed, to determine whether it is based on source content from a different feed. More specifically, the content item can be searched for one or more attributes indicating a source on which the specific received content item is based. Where it is determined that a specific received content item from a first feed is based on source content from a second feed, it is further determined whether the user subscribes to the second feed which the source content is from. If so, the specific received content item from the first feed is not provided to the user, responsive to determining that the specific received content item is based on source content from the second feed and that the user subscribes to the second feed.
In some instances, it is determined that a received content item is not based on source content from a feed to which the user subscribes. This determination can be made by searching for but not identifying any attributes in the content item indicating a source on which it is based. Such a determination can also be made by identifying an attribute in the content item indicating a source on which it is based, and determining that the source does not comprise an additional feed to which the user subscribes. In these instances, the content item is provided to the user, responsive to determining that the content item is not based on source content from a feed to which the user subscribes.
In some embodiments where a plurality of redundant content items are identified, an indication is received from the user to provide redundant content items. Responsive to receiving this indication, the plurality of redundant content items are output to the user. This can take the form of classifying the redundant content items into a hierarchy, displaying a graphic depiction of the hierarchy of redundant content items to the user and prompting the user to select and view redundant content items of the hierarchy.
The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
The Figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Clients 103 and servers 105 can be implemented using computer systems 210 such as the one illustrated in
Although
Other components (not illustrated) may be connected in a similar manner (e.g., document scanners, digital cameras, printers, etc.). Conversely, all of the components illustrated in
The bus 212 allows data communication between the processor 214 and system memory 217, which, as noted above may include ROM and/or flash memory as well as RAM. The RAM is typically the main memory into which the operating system and application programs are loaded. The ROM and/or flash memory can contain, among other code, the Basic Input-Output system (BIOS) which controls certain basic hardware operations. Application programs can be stored on a local computer readable medium (e.g., hard disk 244, optical disk 242) and loaded into system memory 217 and executed by the processor 214. Application programs can also be loaded into system memory 217 from a remote location (i.e., a remotely located computer system 210), for example via the network interface 248 or modem 247. In
The storage interface 234 is coupled to one or more hard disks 244 (and/or other standard storage media). The hard disk(s) 244 may be a part of computer system 210, or may be physically separate and accessed through other interface systems.
The network interface 248 and or modem 247 can be directly or indirectly communicatively coupled to a network 107 such as the Internet. Such coupling can be wired or wireless.
As illustrated in
As illustrated, a multi-source feed reader 305 runs on the computer 210. The multi-source feed reader 305 can be in the form of a hosted application like Google Reader, NetVibes or BlogLines, or in the form of a client based application like Microsoft Outlook. The multi-source feed reader 305 uses conventional functionality to allow the user 303 to subscribe and unsubscribe to multiple feeds 307. The multi-source feed reader 305 also uses conventional functionality to obtain and enable the user 303 to access content items 301 from feeds 307 to which s/he subscribes. In additional to conventional multi-source feed reader 305 functionality, the multi-source feed reader 305 makes the user's current feed subscription status available to a subscription tracking module 315 of the redundancy reducing management system 101, and provides feed originated content items 301 to an item analyzing module 311 of the redundancy reducing management system 101 prior to making them available to the user 303.
The subscription tracking module 315 maintains a list 313 of all the feeds 307 to which the user 303 currently subscribes. The item analyzing module 311 receives content items 301 as input. The item analyzing module 311 also has access to the feed subscription list 313 maintained by the subscription tracking module 315. The item analyzing module 311 analyzes each received content item 301, to determine whether it is based on similar content from another feed 307 to which the user 303 subscribes.
More specifically, a source attribute identifying module 317 of the redundancy reducing management system 101 identifies attributes in the content item 301 which indicate a source on which the content item 301 is based. Where such an attribute is found, it is determined whether the source on which the content item 301 is based is a feed 307 other than the feed 307 from which the content item 301 originates. As explained above, many content items 301 are based on information gleaned from other sources, and are thus redundant to the original underlying source item, as well as to any other content items 301 based either directly or indirectly on the underlying source. Attributes identifying a source on which a content item 301 being analyzed is based can be in the form of text such as “Via SOURCENAME”, “Source SOURCENAME”, “Credit SOURCENAME” or similar, where SOURCENAME is the name of another feed 307 that posted source information (e.g., the name of a specific blog or online news provider) or the name of the original, underlying source (online content or otherwise). In other embodiments, source attributes can be in different formats as desired.
When a source attribute is found in a specific content item 301 being analyzed, a subscription determining module 319 of the redundancy reducing management system 101 determines whether the identified source is on the list 313 of feeds 307 to which the user 303 currently subscribes. If the user 303 subscribes to the feed 307 that provided the information on which the current content item 301 is based, then providing the current content item 301 to the user would be redundant. This is so because it can be assumed that the user 303 has already received (or will receive) the underlying content item 301 from the source feed 307. Thus, the redundant content item is not presented to the user 303 (e.g., it is removed from the feed 307 prior to being displayed to the user 303). On the other hand, if no source attributes are found in a content item 301 being analyzed, or if only source attributes identifying sources to which the user 303 does not subscribe are found, then the content item 301 is not considered to be redundant, and is made available to the user 303 through the multi-source feed reader 305.
In one embodiment, when the above described analysis discovers that the feeds 307 subscribed to by the user 303 provide redundant content items 301, a redundant item classifying module 321 of the redundancy reducing management system 101 classifies the multiple redundant content items 301 into a hierarchy. An optional “show redundant items” feature could then display this hierarchy of redundant content items 301 to the user 303. For example, suppose the analysis discovers that the user 303 subscribes to a first feed 307 (feed A) which provides an article (content item A). The user subscribes to two additional feeds (feeds B1 and B2) each of which provides a separate article (content items B1 and B2) based on content item A. Finally, the user subscribes to yet another feed C, which provides content item C, which is based on content item B1. Under this scenario, absent the use of the “show redundant items” feature, the redundancy reducing management system 101 would only display content item A to the user 303. However, if the user 303 elects to show the redundant items (e.g., by operating a corresponding user interface component (not illustrated)), a graphic depiction of the hierarchy of redundant items A, B1, B2 and C would be displayed, prompting the user to select and view any of these redundant content items 301 if desired. By electing to not show redundant items, the hierarchy would not be displayed, and of the group of items only item A would be presented to the user 303.
It is to be understood that the above-described functionality of the redundancy reducing management system 101 enables the multi-source feed reader 305 to avoid displaying redundant content items 301 to the user even where the redundant content items 301 are not identical duplicates, but are in the form of multiple content items 301 based directly and/or indirectly on a common source, such that the redundant content items 301 contain differently worded and/or formatted versions of the same and/or overlapping content.
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the portions, modules, agents, managers, components, functions, procedures, actions, layers, features, attributes, methodologies, data structures and other aspects are not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, divisions and/or formats. The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or limiting to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain relevant principles and their practical applications, to thereby enable others skilled in the art to best utilize various embodiments with or without various modifications as may be suited to the particular use contemplated.
Number | Name | Date | Kind |
---|---|---|---|
6243757 | Kanodia et al. | Jun 2001 | B1 |
7590691 | Gonsalves et al. | Sep 2009 | B2 |
7761423 | Cohen | Jul 2010 | B1 |
7818392 | Martino et al. | Oct 2010 | B1 |
7912894 | Adams | Mar 2011 | B2 |
7957723 | Punaganti Venkata et al. | Jun 2011 | B2 |
7979527 | Bardsley | Jul 2011 | B2 |
8103629 | Sagar et al. | Jan 2012 | B2 |
8200775 | Moore | Jun 2012 | B2 |
8214445 | Bardsley | Jul 2012 | B2 |
8239494 | Lunt | Aug 2012 | B2 |
8560575 | Gradin et al. | Oct 2013 | B2 |
8589418 | Kane | Nov 2013 | B1 |
8732147 | Nealer | May 2014 | B2 |
8745481 | Ulm | Jun 2014 | B1 |
8954451 | Zendejas | Feb 2015 | B2 |
8984098 | Tomkins | Mar 2015 | B1 |
9110882 | Overell | Aug 2015 | B2 |
20070083520 | Shellen | Apr 2007 | A1 |
20080034045 | Bardsley | Feb 2008 | A1 |
20080126476 | Nicholas et al. | May 2008 | A1 |
20090119375 | Shenfield | May 2009 | A1 |
20090222527 | Arconati et al. | Sep 2009 | A1 |
20090292548 | Van Court | Nov 2009 | A1 |
20090299976 | Dexter | Dec 2009 | A1 |
20100131455 | Logan et al. | May 2010 | A1 |
20100162375 | Tiu, Jr. | Jun 2010 | A1 |
20100312769 | Bailey et al. | Dec 2010 | A1 |
20110022669 | Pascoe | Jan 2011 | A1 |
20110082848 | Goldentouch | Apr 2011 | A1 |
20110113032 | Boscolo | May 2011 | A1 |
20110119239 | Jabaud et al. | May 2011 | A1 |
20110137894 | Narayanan | Jun 2011 | A1 |
20110191372 | Kaushansky | Aug 2011 | A1 |
20110191406 | Plunkett et al. | Aug 2011 | A1 |
20110231296 | Gross | Sep 2011 | A1 |
20110246457 | Dong et al. | Oct 2011 | A1 |
20110246484 | Dumais et al. | Oct 2011 | A1 |
20120005209 | Rinearson et al. | Jan 2012 | A1 |
20120042020 | Kolari et al. | Feb 2012 | A1 |
20120054115 | Baird-Smith | Mar 2012 | A1 |
20120066618 | Barker | Mar 2012 | A1 |
20120072360 | Sarbaev | Mar 2012 | A1 |
20120102402 | Kwong | Apr 2012 | A1 |
20120124149 | Gross et al. | May 2012 | A1 |
20120131139 | Siripurapu et al. | May 2012 | A1 |
20120150971 | Bahrainwala et al. | Jun 2012 | A1 |
20120179752 | Mosley et al. | Jul 2012 | A1 |
20120259866 | Austin | Oct 2012 | A1 |
20120278187 | Lunt | Nov 2012 | A1 |
20120290605 | Ickman | Nov 2012 | A1 |
20120323704 | Steelberg et al. | Dec 2012 | A1 |
20130066841 | Rose | Mar 2013 | A1 |
20130173531 | Rinearson | Jul 2013 | A1 |
20130254298 | Lorphelin | Sep 2013 | A1 |
20130304822 | Tetreault | Nov 2013 | A1 |