The invention relates generally to computer systems, and more particularly to an improved system and method for a user interface to navigate a collection of tags labeling content.
The collaborative efforts of users participating in social media services such as Wikipedia, Flickr, and Delicious have led to an explosion in user-generated content. The content can occur in various forms, such as text, photos, video, audio, or multimedia content. A popular way of organizing the content is through tagging. In fact, a considerable amount of such content is labeled by user-defined tags. The tags provided by the user provide useful descriptors of the content, especially in the case of multimedia. In Flickr, for example, users may upload and share photos, and may place tags on their own or others' photos. Such an online image sharing service may allow a user to append a tag to any photo in the system resulting in the addition of over a million tags each week to the collection of photos accessible through the service. For any of these applications, visualizing and navigating the space of such numerous tags presents a challenging task.
In order to explore content items in social media applications, there is a need for being able to browse the tags labeling and annotating the content items. Past techniques for visualizing this information have been functional but inadequate. Tag clouds are the de-facto means to visualize what tags are used to describe the content of the property or the collection. See http://flickr.com/photos/tags/ and http://del.icio.us/tag/ for examples. Unfortunately, this visualization can be difficult to interpret due to lack of organization. All the tags are mixed together in one big soup. For a large collection of tags, big tag clouds can be even more difficult to interpret. Furthermore, most tools for browsing tags only offer means to go from one tag to another, but do not offer the possibility of exploring the portion of the tag-space which is determined by a combination of tags or a query. As user-defined tags of content continue their explosive growth, users face the problem of exploring a potentially immense tag space without an ability to semantically explore such a collection of user-defined tags.
What is needed is a way to visualize and navigate a collection of user-defined tags by semantically exploring a user-defined collection of tags. Such a system and method should allow users to effectively explore tag spaces at any depth in the collection, and accordingly browse collections of tagged content items.
Briefly, the present invention may provide a system and method for visualizing and navigating a collection of tags labeling content items in a user interface. In various embodiments, a client having a tag explorer may be operably coupled to a server for requesting tags and representative content items from storage. The tag explorer may generate a visualization in a user interface by displaying a categorized subset of related tags from a collection of tags labeling content items. The server may include an operably coupled tag subspace analyzer for selecting and ordering the categorized subset of related tags from the collection of tags labeling content items, a related tags engine for determining a subset of tags from the collection of tags labeling content items that are related to one or more tags in a query request, and a semantic classification engine for categorizing tags from the collection of tags.
The present invention may efficiently provide a user interface for navigating a collection of tags labeling content. To do so, one or more tags may be submitted in a query, a ranked list of related tags may be determined and clustered into categories, and then the clusters of related tags may be sent to a client device for display. A client device may display the categories and the related tags in each category by selecting a font size based upon a tag relatedness score that may represent the relative score of the tags. Representative content items labeled by the related tags may also be displayed.
Advantageously, the present invention may flexibly allow refining the search space of a collection of tags by adding additional tags to a search query or expanding the search space of the collection of tags by removing tags from a search query. The tag explorer may generate an updated visualization by displaying a categorized subset of related tags for the requested search scope from the collection of tags labeling content items. Online content publishing and social media applications may use the present invention for visualization and navigation of tags in a collection of tags labeling any types of content, including text, audio, images, video, multimedia content, and so forth. Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer system 100 may include a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer system 100 and includes both volatile and nonvolatile media. For example, computer-readable media may include volatile and nonvolatile computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer system 100. Communication media may include computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For instance, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The system memory 104 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 106 and random access memory (RAM) 110. A basic input/output system 108 (BIOS), containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, is typically stored in ROM 106. Additionally, RAM 110 may contain operating system 112, application programs 114, other executable code 116 and program data 118. RAM 110 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by CPU 102.
The computer system 100 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
The computer system 100 may operate in a networked environment using a network 136 to one or more remote computers, such as a remote computer 146. The remote computer 146 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100. The network 136 depicted in
The present invention is generally directed towards a system and method for visualization and navigation of a collection of tags labeling content items in a user interface. More particularly, the present invention may generate a visualization of a subspace of a collection of tags labeling content items in a user interface. As used herein, a tag means information that may label or annotate any type of content item including content such as text, audio, image, video, and multimedia content. A user may interact with the visualization of a subspace of the collection of tags to refine or expand the visualization of the subspace of the collection of tags. And an updated visualization may be generated for the requested search scope.
As will be seen, the techniques described may be applied for online content publishing and social media applications for visualization and navigation of tags in a collection of tags labeling any types of content, including text, audio, images, video, multimedia content, and so forth. As will be understood, the various block diagrams, flow charts and scenarios described herein are only examples, and there are many other scenarios to which the present invention will apply.
Turning to
In various embodiments, a client computer 202 may be operably coupled to one or more servers 212 by a network 210. The client computer 202 may be a computer such as computer system 100 of
The server 212 may be any type of computer system or computing device such as computer system 100 of
The server 212 may be operably coupled to storage of content items such as storage 222 that may include any type of content item 228 that may be labeled with a tag 226. In an embodiment, each tag 226 may be any type of keyword annotation of a content item 228, including for example bookmarks, photos, videos, video fragments, text, audio, and other multimedia content. Each tag 226 may be classified into a category 224.
There are many applications which may use the present invention for visualization and navigation of tags in a collection of tags labeling content. Online content publishing and social media applications are examples among these many applications. For any of these applications, new tags may be generated daily for both new and existing content items, and these additional tags may be incorporated into a collection of tags labeling content items. For instance, an online photographic sharing application may allow users to upload and share photographs, and may also allow users to annotate the photographs with tags. Such an application may provide an opportunity for communities of users to build a layer of information on top of a base of content using tags and annotations. Those skilled in the art may recognize that other online applications such as news article feeds, blogs or bulletin boards, and multimedia data applications such as images, songs, or movie clips may similarly have tags generated on top of the content. Such applications may use the present invention for visualizing and navigating tags labeling content items.
For example,
A user may choose to browse the collection of tags represented by the subset of popular tags displayed in
A user can choose to continue exploring the collection of tags by either adding tags to the tags of the search query, removing tags from the tags of the search query, or choosing one or more new tags for a search query. For instance, a user may choose to add “Serengeti” as a tag in the search query so that relevant tags for “safari Serengeti” may be explored. This is depicted in
The visualization of a subset of a collection of tags may also include a display of content items labeled by the subset of tags displayed. In an online photographic sharing application, representative images labeled by the subset of tags may also be displayed as illustrated in
Otherwise, if the request received may not be a request for a new tag, then it may be determined at step 904 whether a request may be received for removing one or more tags from the previous search string of tags. In an embodiment, a user may make a request to remove a tag from the tags of a search query by deleting one or more tags from the previous search string displayed in the text input field for a search query. If so, one or more tags may be removed from the list of tags in the search query at step 906 and processing may be continued at step 912 for the tag(s) in the modified search query.
If the request received may not be a request to remove one or more tags from the tags of a search query, then it may be determined at step 908 whether a request may be received for adding one or more tags to the previous search string of tags. In an embodiment, a user may make a request to add one or more tags to the tags of a search query by adding one or more tags to the previous search string displayed in the text input field for a search query. If a request is received for adding one or more tags, one or more tags may be added to the tag(s) in the search query at step 910 and processing may continue at step 912.
At step 912, a list of related tags in order by tag relatedness score may be obtained for the tag(s) in the search query from the collection of tags. To obtain a list of related tags in order by tag relatedness score for the tag(s) in the search query from the collection of tags, any ranked list of tags with a score may be used, including a list of most popular tags, a list of most recent tags, or a list of the most related tags for the tag(s) in the search query. For example, to determine a list of the most related tags for the tag(s) in the search query, tags that co-occur for content items in the collection of content items tagged by the collection of tags may be selected as related tags and ranked in order of their frequency of co-occurrence.
In particular, a list of related tags may be obtained by generating pairs of tags from each of the terms in a tag labeling an individual content item in the collection of content items tagged by the collection of tags. For example, if a content item such as a photo has the tag “Eiffel Tower, Paris, France”, the following pairs of tags may be generated: (Eiffel Tower, Paris), (Eiffel Tower, France), (Paris, France). The frequency of the co-occurrence of each pair of terms in tags may be counted for all of the content items in the collection of content items tagged by the collection of tags. This list of related tags with their frequency of co-occurrence may then be used to obtain a list of relevant tags in order by tag relatedness score for the tag(s) in the search query.
From the list of related tags with their frequency of co-occurrence, a list of related tags with their tag relatedness score may be obtained for each tag in the search query, and these lists may be merged to generate a single ranked list of relevant tags for the search query in order by a tag relatedness score. The tag relatedness score can be calculated in several ways using either symmetric or asymmetric co-occurance measures. For example, for a query term q and a related term c, the score for c can be calculated using the probability of c given q, |q∩c|/|q|, the probability of q given c, |c∩q|/|c|, or the Jaccard-coefficient of q and c, |q∩c|/|q ∪c|. The lists may be merged to generate a single ranked list of relevant tags in a number of ways, including by voting. For instance, the appearance of a tag in a list counts as a vote for that tag. Tags may then be ordered by decreasing number of votes. Those skilled in the art will appreciate that the voting algorithm can be altered in many ways where the weight of each vote may be biased toward a set of tags with desired characteristics. For example, weights may depend upon the score of the related tag according to the co-occurrence measure, the rank of the related tag in the list of related tags, the frequency of the tag in the search query for which the list was generated, the frequency of the related tag, and so forth.
Upon obtaining a list of related tags in order by a tag relatedness score, the list of relevant tags may be clustered at step 914 by semantic category. In an embodiment, the list of related tags may be classified in categories by matching the tags using anchor texts with categories of web pages in a hyperlinked corpus of classified web pages as described in related copending U.S. Patent Application, Attorney Docket No. 1780, entitled “SYSTEM AND METHOD FOR CLASSIFYING TAGS OF CONTENT USING A HYPERLINKED CORPUS OF CLASSIFIED WEB PAGES,” assigned to the assignee of the present invention. At step 916, the list of related tags may be output by semantic category for display of a visualization of the tags.
After a category label may be selected for display from the list of categorized tags, a tag classified in the category may be obtained at step 1006 from the list of categorized tags and a font size may be selected at step 1008 for display of the tag based upon the tag score. The higher the tag score, the larger the relative font size may be to reflect a higher tag score of the relevance of the tag. And at step 1010, the tag may be displayed in the display area with the category label using the selected font size based on the tag score.
At step 1012, it may be determined whether the last tag in the given category has been displayed. If not, then processing may continue at step 1006 by obtaining a tag classified in the given category from the list of categorized tags. Otherwise, it may be determined at step 1014 whether the last category in the list of categorized tags has been displayed. If not, then processing may continue at step 1004 by displaying a category label from the list of categorized tags. Otherwise, a list of representative content items labeled by the tags from the list of categorized tags may be displayed at step 1016 in a display area for content items, and processing may be finished. In various embodiments, the display area for content items may be below the display area for the categorized tags.
Thus the present invention may generate a visualization in a user interface of a subspace of a collection of tags labeling content items by displaying a categorized subset of related tags from the collection of tags. A user may interact with the visualization of a subspace of the collection of tags to refine or expand the visualization of the subspace of the collection of tags. For instance, a user may refine the visualization of a subspace of the collection of tags by adding tags to a search query. Or a user may expand the visualization of a subspace of the collection of tags by removing tags from a search query. An updated visualization may be generated by displaying a categorized subset of related tags for the requested search scope from the collection of tags labeling content items. Those skilled in the art will appreciate that other controls and implementations may be used for changing the search scope of the subspace of the visualization of the collection of tags. For example, a symbol such as a plus sign may be displayed to the right of each tag so that a user may add tags to a query by clicking the plus sign to the right of the corresponding tag.
As can be seen from the foregoing detailed description, the present invention provides an improved system and method for visualization and navigation of a collection of tags labeling content items in a user interface. Related tags may be determined for a query and clustered into categories. The categories, the related tags in each category, and representative content items labeled by the related tags may then be displayed. The system and method may apply broadly to visualizing and navigating a collection of tags labeling any types of content, including text, audio, images, video, multimedia content, and so forth. As a result, the system and method provide significant advantages and benefits needed in contemporary computing and in online applications.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.