1. Field of Disclosure
The present disclosure generally relates to a context-based media file annotation recommendation system for annotating stock photography media files that presents a list of recommended annotations based on a selected set of similar annotations and a selected set of similar media files.
2. Brief Description of Related Art
In recent years, a number of large databases of digital images have been made accessible via the Internet. Typically, searchers looking for a specific digital image employ an image retrieval system for browsing, searching and retrieving images from the image databases. Most traditional image retrieval systems utilize some method of adding metadata to the images such as captioning, keywords, or descriptions and the like.
Subsequently, image retrieval can be performed by searching for text strings appearing in the metadata. Searching a particular image from a large image database via image retrieval systems can be challenging at times. For most large-scale image retrieval systems, performance may depend upon the accuracy of the image metadata. Although the performance of the content-based image retrieval systems has significantly improved in recent years, typically image contributors may still be required to provide appropriate keywords or tags that describe a particular image.
Previous work has explored methods for circumventing this problem. One area where image tag recommendation remains underexplored is in the context of online stock photography. The application of tag recommendation techniques to online stock photography has not yet been explored.
One shortcoming of co-occurrence based tag recommenders such as tag recommender 340 depicted in
As the name suggests, the conventional tag co-occurrence based recommenders may only consider text strings of image tags and identify similarities between the text strings of image tags. Besides image tags, images have many other attributes such as image type, image color, image texture, image content, image context, image source, image ratings, media type and the like. These non-textual similarities can be of vital importance in generating succinct image tags that are so crucial in generating stock photography revenue. Given that non-textual similarities between images can play an important role in suggesting appropriate and concise tags, a novel recommender is needed to cure the infirmity.
By way of introduction only, the present embodiments provide a context based tag recommendation system for annotating stock photography media files, the system configured to: maintain a first database comprising a set of media files and a set of annotations associated with each media file set; maintain a second database comprising a list of annotations associated with each media file in the first database (list), and a corresponding set of co-occurring annotations; maintain a third database comprising a media file in the first database, a set of annotations associated with the media file, a list of similar media files, and a set of annotations associated with the list of similar media files, receive an image tag input, query the second database using the image tag input to identify a selected set of similar annotations; and present a list of recommended annotations based on the selected set of similar annotations and the selected set of similar images. The foregoing discussion of the preferred embodiments has been provided only by way of introduction. Nothing in this section should be taken as a limitation of the claims, which define the scope of the invention.
The present disclosure describes computer implemented systems and methods, that may utilize an algorithm, for use in a graphical user interface employing efficient annotation strategies for tagging stock photography media files. The disclosed method recommends media file annotations based on a tag co-occurrence, and context information derived from a selected set of similar media files. With the increasing rate of multimedia content creation and associated growth in media file collections, the task of retrieving relevant media files from a gigantic collection of media files has become more challenging than ever. Typically, media file search systems do not conduct media file searches based on detecting information contained in the media file itself. In contrast, like other search engines, most media file search engines have access to metadata of several media files, wherein metadata may be indexed and stored in a large database. When a media file search request is generated by a user specifying a search criterion, the media file search engine browses through the index to identify search results comprising a set of media file files that meet the user specified search criteria. The search engine generally presents the search results in the order of relevancy. The usefulness of a media file search engine may depend on the relevance of the search results generated by the search engine along with the ranking algorithms employed in ranking the search results in the order of relevancy. Search results may also include a set of thumbnail media files, sorted by relevancy, where each thumbnail is a link back to the original web site where that media file is located.
In this specification, a media file may refer to any file in a digital storage device such as an audio, video or image file. These files come in different file formats such as mp3, aac and wma for audio file and mkv, avi and wma for video files. The terms “media file” and “image” or “image file” are used interchangeably throughout this specification. Media file indexing is one of the basic methods employed for conducting an efficient media file search. Media file indexing comprises associating a set of relevant keywords or tags with a relevant set of media files. Performance of the media file search and retrieval systems may become increasingly dependent upon the quality of the media file tags. Thus, designating appropriate media file tag attributes can be extremely important for optimal performance of media file search and retrieval systems. It is desired that media file tags should be both relevant and comprehensive.
In photo sharing sites such as Flickr, tags are usually provided by the contributor of the media file or members of the media file provider's social circle. The process of tagging media files, however, can be tedious and error prone. For this reason, tag recommendation can not only help reduce the burden of tag providers and speed up the tagging process, but can also improve the quality of the recommended tags.
Tag recommenders can be employed using two separate approaches. First, effective tag recommenders can be employed interactively, wherein tag recommenders suggest tags which can be either accepted or rejected by a user at the time a media file is uploaded in a media file collection. Alternatively, tag recommenders can be employed automatically where, based on a knowledge base, whenever a media file is uploaded in the media file collection, a tag recommender automatically generates a set of tags for the uploaded media file without any human intervention. In addition to speeding up the tagging process, a tag recommender also helps reduce the tag misspelling rate. Further, a tag recommender assists contributors who are asked to supply tags in a foreign language.
In stock photography, photographs of various subject matters such as people, pets, and travels etc. are licensed for specific uses. Instead of hiring a photographer for a specific assignment, in stock photography, licensed photographs often fulfill the demands for creative assignments at a lower cost. Presently, stock media files can be searched, purchased and delivered online via several searchable online databases. Revenue generation in stock photography sites is different from revenue generation in conventional photography search sites. In particular, stock photography contributors may provide higher quality media files and are compensated when one of their media files is downloaded. This provides increased incentive for stock photography contributors to effectively tag their media files.
From a revenue generation perspective, it is essential that the stock photography media files are precisely annotated and carefully organized. Even for stock photography contributors, tagging still remains a time consuming and challenging task. Accordingly, in the field of stock photography, a need exists for an effective tag recommender.
In order to develop an effective tag recommender for the stock photography domain, a large collection of media files from a prominent stock photography site was analyzed to study tagging behavior of contributors. When the results were compared to previous photo sharing research, it was observed that notable differences exist between tagging behavior of stock photography contributors and other media file photography contributors. It was observed that, by itself, media file retrieval backed by the tags generated by the conventional tag based recommender was unable to effectively locate stock photography media files. After studying a contributor's motivation for providing tags, it was observed that organizational needs were more important than facilitating easy media file retrieval. [See: M. Ames and M. Naaman. “Why we tag: motivations for annotation in mobile and online media”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '07, pages 971-980, New York, N.Y., USA, 2007. ACM.] [Ames et al.] This may be the case because generating a large set of tags for a given media file is known to be a difficult task [See: C. Wang, F. Jing, L. Zhang, and H.-J. Zhang. “Content-based media file annotation refinement”. In CVPR, 2007.] [Wang et al.].
A crowd-sourcing approach has been proposed as an effective tool to ease the burden of tagging a large number of media files. One of the issues associated with using a crowd-sourcing approach is that maintaining the quality of generated tags can be a challenge nonetheless. An initial experiment was conducted using qualification tests to assert quality control while gathering image annotation data from Amazon's Mechanical Turk (MTurk) (First experiment) Prescreening the MTurkers using the qualification tests and only allowing qualified candidates to undertake the annotation task was shown to be the best strategy for improving the quality of the generated image annotation data [See: C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. “Collecting image annotations using amazon's mechanical turk”. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, pages 139-147, Stroudsburg, Pa., USA, 2010. Association for Computational Linguistics.] [Rashtchian et al.].
To overcome this deficiency, a co-occurrence based tag recommender was designed using the insight from the behavioral analysis. Interestingly, the performance of the tag recommender was significantly improved when similar images context information was made available to the tag recommender in addition to the tag co-occurrence context information. After exploring several different recommendation strategies, it was observed that tag recommender performance is significantly improved when a recommender uses a combination of tag co-occurrence context information along with similar image context information, as compared to a recommender that only uses tag co-occurrence context information. Thus, a novel recommender was designed that uses, along with tag co-occurrence information, similar image information to supply more context information to the tag recommender.
It was found that the added context information often led to improved quality of recommended tags as measured by two independent experiments. Two independent experiments were conducted to confirm this finding. The new recommender was constructed in order to capture three different insights in the domain of stock photography: (1) Tagging behavior in the stock photography setting is demonstrably different from popular photo sharing services. (2) Studies exploring different tag co-occurrence measures indicate that a linear combination of two specific measures may lead to an optimal performance of the tag recommender. (3) Conducted studies empirically demonstrate that a novel strategy that incorporates similar images context information to the conventional co-occurrence based tag recommender can not only expand the range of contextual information made available to the tag recommender, but can also significantly improve the precision and accuracy of the recommended tags.
Previous research has analyzed the image tagging behavior of contributors to online photo sharing services [See: B. Sigurbjörnsson and R. van Zwol. “Flickr tag recommendation based on collective knowledge”. In Proceedings of the 17th international conference on World Wide Web, WWW '08, pages 327-336, New York, N.Y., USA, 2008. ACM.] [Sigurbjörnsson]. Typically, online photo sharing services such as Flickr do not impose any restriction either on the number of tags or on the nature of tags a user can provide for a given image. While image contributors of the online photo sharing services do have some motivation to tag their images, the level of motivation is not as high as the motivation of the image contributors of an online stock photography site [See: Ames et al.]. Resultantly, Flickr images may seldom have more than a handful of tags [Sigurbjörnsson]. As seen in
As discussed above, typically the contributors of online stock photography are highly motivated because stock photography contributors are only compensated after an image is downloaded. Likewise, as discussed above, stock photography contributors are compensated each time an image is downloaded. If stock photography images are precisely tagged, then there is a greater probability that the images can be easily searched and downloaded. Hence stock photography contributors are highly motivated to supply accurate and concise tags. Accordingly, many stock photography sites often undertake measures to avoid keyword spam. For the site considered in the first experiment that was conducted, a hard limit of 50 tags per image was imposed. [First experiment]
To understand the implications of this tag limit, a set of one million images was randomly chosen from the stock photography collection.
Power law was not a good fit in the context of stock photography either.
Most stock photography images generally have a large number of tags. It is possible to employ the tag recommendation algorithm to recommend additional tags based on each of the large number of stock photography tags. However, a more typical scenario is to ask an image contributor to provide a small number of highly relevant tags for an image contributed by the image contributor, and to then recommend related tags based on the small number of highly relevant tags. In order to mimic the situation of a contributor providing the more pertinent tags for an image, Amazon's Mechanical Turk was used to gather tags for 200 stock photography images. The image set was chosen at random in a way that matched the image category distribution of the entire collection.
Now referring to
The results for this assignment were aggregated and a distribution of tag frequency was generated for each of the 200 images. To avoid including uncommon tags, any tag that was not suggested at least twice for an image was dropped, leaving at least ten unique tags for each image. This data set is subsequently referred to as FF-TAGS to reflect the free-form nature of the tag gathering task (FF-TAGS).
Tag co-occurrence is a useful technique that provides a basis for tag recommendation. An overview of the recommender strategy using co-occurrence is presented in
In the case of Flickr, Sigurbjörnsson argued that an asymmetric co-occurrence measure provides the best tags:
Where the asymmetric co-occurrence measure of tag tj given tag tj is a normalized measure of how often the tags tj and ti appear together. Symmetric co-occurrence measure i.e. Jaccard coefficient for tags tj and ti was considered as an alternative:
It was observed that the symmetric co-occurrence measure and the asymmetric co-occurrence measure often recommended different sets of tags. The asymmetric co-occurrence measure is susceptible to recommending very common tags too often.
On the other hand, the symmetric measure (SYM) may be too restrictive in some cases and tends to find tags with equivalent meaning. In an attempt to capture the benefits of both the symmetric co-occurrence measure and the asymmetric co-occurrence measure, a recommender using a linear combination of both the symmetric co-occurrence measure and the asymmetric co-occurrence measure was tested (SYM-ASYM):
SYM-ASYM(tj|ti):=λJ(ti,tj)+P(tj|ti)
To obtain the weight parameter λ the recommender was trained using the 100 training images obtained in FF-TAGS. The top 4 most-suggested tags for each image were provided as an input tags to the recommender. Then the recommendations were evaluated using precision at rank 5 (P@5). Table 1 shown in
Column 610 refers to the reading of the given measures for FF-TAGS. As discussed above in conjunction with
As will be discussed later with reference to
A novel tag recommender employing a combination search strategy is suggested in the present disclosure. Advantageously, the search strategy of the novel tag recommender supplements keyword co-occurrence searching with searching similar image information (SIM-IMG). A flow chart embodying this strategy is depicted in
At step 400, an image tag input is received from a user or a search engine, not shown in
Importantly, the selected set of similar images 440 provides additional context information available to the tag recommender 450, which enables the tag recommender to suggest to the image contributor, a set of tags that adequately describes, accurately define and precisely label the image being tagged. For example, the tag recommender 450 can consider the tags associated with the selected set of similar images 440. In a preferred embodiment of the disclosed invention, the tag recommender 450 may consider frequency of a particular tag in the selected set of similar images 440. If a specific tag is associated with most of the images in the selected set of similar images 440, it is likely to be a good recommendation candidate. Similarly, if a specific tag is not associated with any of the images in the selected set of similar images 440, it is not likely to be a good recommendation candidate.
Furthermore, if behavioral data is available for the selected set of similar images 440, it can further assist the tag recommender 450 in determining whether or not a particular tag should be recommended to the image contributor. For example, behavioral data can indicate whether a particular tag has appeared in queries that led to a similar image being downloaded or viewed. Further, behavioral data can also indicate how frequently a particular tag has appeared in queries that lead to a similar image being downloaded or viewed. Based on the behavioral data, the tag recommender 450 can determine whether or not a particular tag is recommended. In other words, a particular tag frequently appearing in queries that lead to a similar image being downloaded or viewed is a strong indication that the tag is also relevant to the input image which the image contributor is seeking to tag.
Once retrieved, various attributes can be derived from the selected set of similar images 440 and may be used by the tag recommender 450. Based on a set of preferences, specific attributes can be derived from the selected set of similar images. In one embodiment, the set of preferences are preconfigured by the image contributors. Alternatively, the set of preferences may be preconfigured by users of the tag recommender 450. This is in contrast to the conventional co-occurrence based tag recommenders generating tag recommendations based entirely on tag co-occurrence. In one embodiment of the disclosed invention, the tag recommender 450 derived and used six different attributes from the set of similar images 440.
The attributes derived and used by the tag recommender 450 were (1) tags associated with the selected set of similar images (2) descriptions associated with the selected set of similar images (3) frequency of the tag occurrence in queries that lead to a similar image being downloaded (4) frequency of the tag occurrence in queries that lead to a similar image being viewed (5) tags recommended by the symmetric co-occurrence measure (6) tags recommended by the asymmetric co-occurrence measure. The tag recommender 450 used a weighted summing strategy to generate recommended tags after combining the six attributes derived from the selected set of similar images 440.
It is noted that the tag recommender 450 processes the selected similar images generated at step 440 along with the selected similar tags generated at step 330 to present a set of tags to the image contributor, wherein the set of tags is based on both tag co-occurrence and similar images. The manner in which tag recommender 450 processes selected similar tags and selected similar images is further illustrated in conjunction with
Now referring to
At step 530, the tag recommender 540 generates a selected set of similar tags, wherein each tag in the selected set of tags is separate and distinct from the image tag input entered by the image contributor. The manner in which the selected set of similar tags is generated is discussed in further detail in conjunction with
Now referring back to table 1 in
Referring now to
All image-tag pairs were presented to five different MTurk workers who were asked to rate the recommended tag on the rating portion 830 of the interface 800. The ratings portion 830 included tag rating gradations such as tag being: “Not Good”, “Good”, “Very Good”, or “Don't Know”. As in Sigurbjörnsson, gradations “Good” and “Very Good” were treated equally, turning the rating task into the rating task indicating a binary rating.
The ratings were tallied across the workers and ties were broken in favor of “Not Good”. Using this rating set i.e. RATED-TAGS, P@5 was computed for the four above mentioned recommenders. Results are shown in Table 1 in
Now referring to
As seen in
The system also maintains a second database comprising a second set of records, each record comprising an annotation, at least one co-occurring annotation, and a count indicating a number of media files associated with the annotation and the at least one co-occurring annotation. A second database 1100 of the disclosed embodiment is illustrated in
Annotation 1120 indicates a tag entered by a user. Co-occurring annotations 1130 are the tags that have appeared with a given annotation 1120 on several occasions. The exact number of times the annotation and co-occurring annotation have appeared together is indicated by count 1140. For example, the tag “city” may be associated with (skyline, 10), (urban, 5), (buildings, 3), (cars, 2), (busses, 2), (billboards, 1) which represents the fact that 10 images contain both the tags city and skyline, 5 images contain both city and urban, etc. The system of the disclosed invention is configured to maintain title information and behavioral data associated with the set of similar image files, for example, how often the image was downloaded starting with a query for one of its tags.
The system may then conduct a first query on the second database using the media file tag input to identify a ranked set of co-occurring annotations and a count associated with each annotation in the ranked set of co-occurring annotations. The ranked set of co-occurring annotations is an ordered listing of co-occurring annotations from the second database 1120, for example, the count associated with tag sky is 10, and the count associated with tag sun is 8. Accordingly the tags are arranged in a descending order.
From the ranked set of co-occurring annotations, the system may identify the selective group of annotations from the ranked set of co-occurring annotations, such that the count associated with each annotation selective group of annotations meets a predefined criterion. For example, in one embodiment of the disclosed invention, the system may define the selective group of annotations where the count associated with each annotation is greater than 8.
Alternatively, any other number may be set as a predefined criterion. After identifying the ranked set of co-occurring annotations, the system may conduct a second query on the first database to identify a selected set of similar media files, by iteratively retrieving image files associated each annotation from the selective group of annotations. Then the system may present a list of recommended annotations based on the ranked set of co-occurring annotations and the selected set of similar media files.
The system may alternatively generate the set of similar media files is based on a predefined criterion such as appearance of the given media files in a same query, or appearance of the given media files in a same download transaction. The system may maintain title information and behavioral data associated with the set of similar media file files. Similarly, the system may maintain the transaction history by maintaining for each received media file tag input, the ranked set of co-occurring annotations and the selected set of similar media files. The system may also maintain the transaction history such as media files downloaded from the first database for each received media file tag input.
Tagging remains an important component of an effective image retrieval system. In contrast to previous studies a linear combination of symmetric and asymmetric co-occurrence measures improved recommended tag quality. The disclosed new recommendation strategy incorporates similar images and co-occurrence matrix to significantly improve the precision of recommended tags. In one embodiment of the disclosed invention the selection of similar images is automated so that relevant tags could be added to images in an automated fashion.
Number | Name | Date | Kind |
---|---|---|---|
5893095 | Jain | Apr 1999 | A |
5983237 | Jain | Nov 1999 | A |
6212517 | Sato | Apr 2001 | B1 |
6332120 | Warren | Dec 2001 | B1 |
6526400 | Takata | Feb 2003 | B1 |
6721733 | Lipson | Apr 2004 | B2 |
6856987 | Kobayashi | Feb 2005 | B2 |
6904560 | Panda | Jun 2005 | B1 |
7006689 | Kasutani | Feb 2006 | B2 |
7400785 | Haas | Jul 2008 | B2 |
7664803 | Kobayashi | Feb 2010 | B2 |
7685198 | Xu | Mar 2010 | B2 |
7792811 | Nagarajayya | Sep 2010 | B2 |
8392430 | Hua | Mar 2013 | B2 |
8452794 | Yang | May 2013 | B2 |
8559731 | Mass | Oct 2013 | B2 |
8571850 | Li | Oct 2013 | B2 |
20020042794 | Konaka | Apr 2002 | A1 |
20050050469 | Uchimoto | Mar 2005 | A1 |
20050114319 | Brent | May 2005 | A1 |
20060206516 | Mason | Sep 2006 | A1 |
20070287458 | Gupta | Dec 2007 | A1 |
20080104065 | Agarwal | May 2008 | A1 |
20080282186 | Basavaraju | Nov 2008 | A1 |
20090138445 | White | May 2009 | A1 |
20110035350 | Zwol | Feb 2011 | A1 |
20110145327 | Stewart | Jun 2011 | A1 |
20110176737 | Mass | Jul 2011 | A1 |
20110218852 | Zhang | Sep 2011 | A1 |
20110282867 | Palermiti, II | Nov 2011 | A1 |
20120101893 | Tsai | Apr 2012 | A1 |
20120219191 | Benzarti | Aug 2012 | A1 |
20120226651 | Chidlovskii | Sep 2012 | A1 |
20120233170 | Musgrove | Sep 2012 | A1 |
20120254151 | Reitter | Oct 2012 | A1 |
20120323784 | Weinstein | Dec 2012 | A1 |
20120324408 | Shacham | Dec 2012 | A1 |
20130202205 | Liu | Aug 2013 | A1 |
20140067882 | Ikeuchi | Mar 2014 | A1 |
20140280232 | Chidlovskii | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
0473186 | Apr 1999 | EP |
1415245 | Feb 2011 | EP |
Entry |
---|
L. Wu, L. Yang, N. Yu and X-S. Hua. Learning to tag. In Proceedings of the 18th international conference on World Wide Web, WWW '09, pp. 361-370, New York, New York, USA 2009. ACM. |
M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '07, pp. 971-980, New York, NY, USA, 2007. |
C. Wang, F. Jing, L. Zhang, and H.-J. Zhang. Content-based image annotation refinement. In CVPR, 2007. |
C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10 pp. 139 {147, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. |
R. Datta, D. Joshi, J. Li, and J.Z. Wang. Image Retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2): 1-60, 2008. |
B. Sigurbjornsson and R. van Zwol. Flickr tag recommendation based on collective knowledge, In Proceedings on the 17th International conference on World Wide web, Www '08, pp. 327-336, New York, NY USA, 2008. ACM. |
“NicheBot.com”, NicheBOT, 2016, https://web.archive.org/web/20130814214552/http://www.nichebot.com (last visited Aug. 14, 2013). |
“KeywordDiscovery.com”, a Trellian Company, 2004-2015, https://web.archive.org/web/20150421051135/http://keyworddiscovery.com/ (last visited May 20, 2015). |
“WordTracker.com”, Wordtracker LLP, 1998-2016, https://web.archive.org/web/20160914160615/https://www.wordtracker.com/ (last visited Sep. 14, 2016). |
“spellweb.com”, Markup.net, Inc., 2012-2016, https://web.archive.org/web/20160715091832/http://www.spellweb.com/ (last visited Jul. 15, 2016). |
Number | Date | Country | |
---|---|---|---|
20140280113 A1 | Sep 2014 | US |
Number | Date | Country | |
---|---|---|---|
61781073 | Mar 2013 | US | |
61873514 | Sep 2013 | US |