1. Field of the Invention
The present invention relates to organizing images and more specifically to gathering incidental information associated with an image and using that incidental information in a later search and retrieval process.
2. Introduction
With the ubiquitous nature of digital cameras in the world today, there is an issue that arises wherein people are constantly taking and storing digital images. For example, many people have devices such as cell phones or portable devices which have the capability to take still and short video image sequences. These images are often ultimately uploaded to a computer and stored in various file folders. With the opportunity to take many pictures, people have difficulty in organizing and maintaining their various stores of pictures. Often, people desire to retrieve a particular picture and cannot remember either which device the picture may be stored on or which folder may be stored in. Accordingly, what is needed in the art is an improved method and system of organizing a large number of pictures.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth herein.
Disclosed are systems, methods and computer-readable media for organizing images. A method embodiment of the invention includes receiving an image into a device, receiving incidental information associated with the image, organizing the image and incidental information into a sparse array or other data structure, classifying the received image with an image classifier and storing the classified image in an image database, receiving a search query and responding to the search query by searching for and returning matching images in the image database based on the comparison on the search query to the sparse array or other data structure. All automatically associated data including classification data, can be stored in the same spare array or data structure.
Another aspect of this embodiment relates to the sparse array being infinite in length and being open to receiving further incidental information. In this context, after the initial receipt of an image, for example, by taking a picture using a cell phone, the method relates to gathering secondary incidental information upon a further viewing of the image. In this context, if a viewer takes a picture at noon and makes comments about the picture, that incidental information regarding the time, the temperature, the audio signal can be stored in a sparse array or other data structure and associated with that image. Then later, as a user pulls up the image on the cell phone and provides further comments, additional secondary incidental information can be stored in the sparse array or other data structure to further aid in later retrieval of the image.
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only exemplary embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Various embodiments of the invention are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the invention.
With reference to
Although the exemplary environment described herein employs the hard disk, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, digital versatile disks, cartridges, random access memories (RAMs), read only memory (ROM), a cable or wireless signal containing a bit stream and the like, may also be used in the exemplary operating environment.
To enable user interaction with the computing device 100, an input device 174 represents any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. The device output 170 can also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with the computing device 100. The communications interface 172 generally governs and manages the user input and system output. There is no restriction on the invention operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.
For clarity of explanation, the illustrative system embodiment is presented as comprising individual functional blocks (including functional blocks labeled as a “processor”). The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software. For example the functions of one or more processors presented in
As has been noted above, an aspect of the present disclosure relates to a process and system for organizing images.
In one aspect, NTDOA or TDOA data can be gathered from other equipment that is not tightly coupled to the primary receiving device. For example, if a group of people are all taking pictures at the same location with cell phone cameras, or are in the vicinity of someone with a cell phone camera that is talking a picture, then incidental data from those devices can contribute to the data in the data structure. Thus, a system will coordinate and weigh potential incidental data based on various parameters, such as time, distance from primary device, orientation, similarity of composition (some people in pictures), and so forth. For example, two cameras may be next to each other but facing opposite directions. Thus, the visual information from each camera may weigh less in the data structure than the audio or temperature data does to the particular orientation of each camera. Either through weighting or via a threshold analysis will information from other devices be relevant. A device may be a neighboring device and thus have its data included based on an analysis.
Feature 192 generally represents incidental information which may be provided to the system. This may include such information as data regarding current events and weather. For example, a picture may be taken of a large warehouse fire and there may be incident information that is generally chronologically relevant in which information on the internet discussed a warehouse fire and the location of the device 180 can be correlated to using the incidental information such that this various data may be stored in a sparse array associated with a particular image. Feature 186 represents an information retrieval and “cooked” database. That is, the sparse arrays are organized in a form that is preferable for searching. This database may represent the images and its associated sparse array. This database is preferably organized for utilizing the sparse array for retrieval of particular images. As images populate the database 186, a user may be able to use a display device 188 (which may or may not be the same device 180) to submit search queries which can be processed by retrieval engines and a categorization display front end 190 which can communicate with the database 186 to be able to enable the searching, matching and returning of an appropriate image or images in response to a user query. The matching process may conceptually involve determining if a name in one sparse array (a search array) matches with a name in a stored array for an image. If the names match, then values within the arrays may then be compared in their aligned state to determine a final match.
Accordingly, an aspect of the disclosure enables the automatic organization of an image based on gathered incidental information such as date, time, position, orientation of a device, word spotting in received audio, analysis of music or other non-word audio, analysis of prosody in a user's voice, and so forth. For example, if a user takes a picture and the associated audio included with that picture includes people screaming or loud noises in the background or high tension in a user's voice, that incidental information may be utilized to add further information into the sparse array beyond just mere words and word spotting spoken by the user. All the incidental information is stored in the database associated with the image and can be used for search and retrieval. Incidental information may include key words such as names or other nouns that are spoken and recognized around the time the photo was taken, information about who was nearby at the time the photo was taken and so forth. E911 techniques may be used to gather some of this incidental information as well as pattern recognition. As has been noted above, other incidental information may refer to weather, stock market information, or any other data.
The classifier 182 may utilize particular templates in its classification process. Such “named” templates by be supplied by the system or by the end user to categorize the images. For example, to create a personal template for a family, the family may submit a series of family pictures which may be used to train a template to identify particular family members rather than just identifying generically that people are in images. Of course, any set of images may be used to identify animals, cars, planes, etc. Standard templates for various types of objects may be used. In this regard, more accurate automatic classification of images taken by cameras or phones owned by members of that family may be utilized to provide improved and more accurate information regarding who is exactly in each image. In one aspect, a comparison weighting can weigh a bit higher for personal templates than standard templates. The user of a personal template may even be able to “tag” the template with audio, text or other incidental data that may improve the population of sparse arrays. For example, providing audio supports of each person in a photo with identification of who said what.
In one implementation, images that are taken on a cell phone are automatically transferred to a storage facility 182 where further complex processing can occur. The incidental and associated data can be stored in a number of ways which enable easy search and retrieval. A user then logs into an image archival service via module 182 or 190, for example, and launches a search of photos. The user could also upload a photo and ask for a search of similar matches across and image archive like Flickr or photo catalog. If matches are found, they are presented in an interface tailored for the presentation results. One preferable implementation is to store the data in a sparse array or data structure similar to that used by those of skill in the art for information retrieval. Sparse arrays are particularly good for many search tasks including specifying routing algorithms wherein incoming items can be quickly categorized or routed based on a sample sparse matrix specified for each category or routing application. An example of a sparse array would be using an integer which has the count for every time a particular incident happened that you are interested in. For example, you can have a series of 20 photographs and a picture of your dog Spot in ten of those photographs. In a sparse array associated with each image, there can be an entry that indicates the number of times a particular object such as Spot are found in a series of images. Similarly, a sparse array can store an integer that relates to a frequency of the occurance of words that were spoken either before, during and/or after the particular time an image was taken. In this manner, a sparse array or data structure can be considered an infinite length array or structure which has stored in it numbers that are associated with some kind of event associated with the image. When somebody desires to search the images, they can create another sparse array and then a retrieval engine or processing algorithm may compare sparse arrays with sparse arrays using a various metrics which would be known to those of skill in the art to identify matching images.
In addition to sparse arrays, the data structure may also be arranged using XML. An example entry is the following:
Typically, sparse arrays assume that they are modeling a large area of memory with the location being significant. For example, array location 23 (410) might be used in all cases to store the weight or weighting associated with the word “dog.”
As sparse arrays, for IR, assume this congruence between a specific word and a location in the array, this makes computation much faster. In the XML case, one must find the word via some kind of look up—often hashing, but in some cases linear comparison. Again, the lookup can be avoided in the IR style sparse arrays described herein.
The way sparse arrays are represented in memory can be similar to lists of pairs of index and value or weight. For example, pairs of numbers such as 44, 23 and 54, 2. The first integer is the array location and the second is the weight.
There are variants on this technique that look like CDR coding in some lisp language implementations. Tags (strings) and array indices can be mixed in together. The software that provides the implementation of the arrays checks each index value to see if it is an integer or a string. Thus, the data structure used herein can be a sparse array XML or a relational database scheme to store the incidental data.
As has been shown herein, the step of receiving incidental information can occur before, after and/or during the taking of the picture. Other aspects of the method can include the incidental information relating to at least one of environmental data, chronologically relevant audio input to the device, NTDOA data, location data, device orientation data, news data and so forth. The environmental data can relate to at least one of time, date, location information, color, object detection, temperature, received audio when the image was received by the device, audio received prior to or after the image was received and so forth.
As has been noted above, another aspect of the disclosure relates to retrieving secondary incidental information and adding that information to the sparse array. In this regard, the method relates to after storing an image in the image database and upon retrieval of the image from the image database, receiving second incidental information associated with the image and storing the second incidental information in the sparse array. This step preferably occurs during later viewing of the stored image by a user. Thus, as the user retrieves the image, the system received secondary incidental information that is associated with an image and the system may assign a modified weight to at least a portion of the secondary incidental information wherein the secondary incidental information is stored in the sparse array. The weighting of such secondary incidental information may be based on at least one parameter that can relate data associated with the incidental information to the data associated with the secondary incidental information. In other words, if the user retrieves the image six months after the image was taken, then a parameter that relates the initial incidental information (such as the time the image was taken as a picture) and the secondary information (wherein the parameter would be six months later) may be used to modify to a certain degree the weights given to the secondary incidental information. The weights may increase or decrease.
However, if the secondary incidental information, although received six months later, is information rich audio which includes a discussion of the image, the time difference may be outweighed by a characterization of the time of incidental information and thus the data that goes into the infinitely open sparse array may have a higher weight then data associated with the image at the time the image was taken. For example, it may be determined that when the user took the image six months prior, that the audio spoken by the user had nothing to do with the image. The user at that time may have been having a discussion about a car or current events while taking a picture of their dog and a tree, but the image may have nothing to do with the particular conversation. Accordingly, as can be appreciated, the weight is given to various pieces of information in the sparse array are easily manipulated and modified based on analysis of the various parameters and incidental information.
Furthermore, another aspect of modifying the weights of various features in the sparse arrays may be the user preference in searches. For example, if a user continues to perform a certain kind of search, for example, the user may continually search for pictures associated with their dog. In that case, various weights in the sparse array may be modified based on user activity in terms of their searching preferences. Furthermore, an aspect of the disclosure and the ability of using such sparse arrays may be to enable a user when later viewing an image, to provide a natural language search query which is very general and is essentially multimodal in nature. In other words, if the system knows that they are presenting picture 402 to the user and it is the only picture being presented and it includes Donny and his dog Spot. The user may be able to upload an image or retrieve an image and say or input “Show me other picture like this one”. This inquiry may generate a short sparse array which can utilize basic key information about the picture that is being presented. In this regard, while the stored sparse array associated with image 402 may have a lot of information, the sparse array generated as part of this basic search query “Show me other pictures like this one” may generate a sparse array that includes such features as the dog Spot and Donny and the sun shining. Then the information retrieval process may utilize a routing algorithm where the search sparse array may be matched with the more detailed sparse arrays that are associated with the various images and actually return a series of images that are “like this one”, i.e., like the one that is currently being presented to the user. Also as has been noted above, the definition of “the image” may encompass both a still image as well as a video. In the case of video, there may be multiple images within the video that are classified and processed in a manner similar to that discussed above relative to processing a single image. The sparse array may simply utilize more information. For example, if a user pans from an image of a dog and Donny, over to an image of their truck, then the sparse array associated with that video snip-it may encompass both information about a dog, a boy and a vehicle. Furthermore, an aspect of the invention may be that individual images that are combined to make up the video image may be separately searched. In this regard, automatic processing of such a video image may be the equivalent of having an individual sparse array for the individual images that make up the video segment. If a user were being presented with an image of Donny and his dog and were to ask for “Show me pictures like this one” the system may engage in a brief dialog with the user to request whether they are searching for still images only or images within video segments. The system may also just automatically present all images that are found. Thus, if images that are part of an video segment are to be returned, the user may actually receive a series of images that present the individual portions of the video segment in which Donny and his dog are found. Thus, this aspect of the invention enables a much more rich searching even of video content then was previously available.
The concept described above also enables a possibility of a richer sparse array for any individual image in a video scene. For example, in the scene in which the video camera pans from Donny and his dog over to a truck, the individual sparse arrays associated with each individual image in that pan may also be cross-referenced such that there may be data associated with a truck in the sparse array of the individual images that only include Donny and his dog, although with a lesser weight or a modified weight, but that information may be helpful and valuable in terms of the later searching possibilities. Accordingly, the above concepts enable the gathering through various sources of incidental information data associated with the context that a picture or video is taken in. The context is essentially the world and the principles disclosed herein enable a way to manage various pieces of incidental information into a sparse array and via information retrieval mechanisms search for and retrieve at a later time appropriate images.
In another aspect of the invention related to a user uploading, retrieving or providing an image or series of images and requesting “Get me other images that are kind of like these” is the mechanism of performing this kind of search. Here, the system combines all the sparse arrays associated with the image or images into another sparse array that would be similar to a routing algorithm notion. Then, the system processes user's images based on that combined sparse array for the instances. So, for example, the user may input search terms or requests for pictures of “Donny” and his “dog”. The system may return 15 images and the user may select, for example, 4 of these images as relevant. The sparse arrays associated with those images are processed. For example, a simple mechanism would be to add the sparse array and average them. Other processes may occur as well. Then, the system in the background after processing the sparse arrays and perhaps performing other processing, the system uses the new sparse array and searches and presents the next group of pictures as a result of processing based on the new sparse array. Thus, in the processing using the new sparse array additional pictures may be returned which were previously missed in the original search. This system may also bring up more loosely affiliated pictures which may include other features such as siblings or other objects. Combining or processing the sparse arrays associated with the 4 images enables a stronger set of semantics that may modify or adjust values or weights within various positions in the sparse array so that the ultimate search is improved using a new sparse array.
Embodiments within the scope of the present invention may also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or combination thereof to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.
Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps. Program modules may also comprise any tangible computer-readable medium in connection with the various hardware computer components disclosed herein, when operating to perform a particular function based on the instructions of the program contained in the medium.
Those of skill in the art will appreciate that other embodiments of the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Although the above description may contain specific details, they should not be construed as limiting the claims in any way. For example, any concept disclosed herein for sparse arrays, such as their infinite nature to receive further data, may be applied to different data structures used to store data associated with an image. Other configurations of the described embodiments of the invention are part of the scope of this invention. Accordingly, the appended claims and their legal equivalents should only define the invention, rather than any specific examples given.
Number | Name | Date | Kind |
---|---|---|---|
3566080 | Uffelman et al. | Feb 1971 | A |
4923066 | Ophir et al. | May 1990 | A |
5502576 | Ramsay et al. | Mar 1996 | A |
5517234 | Gerber et al. | May 1996 | A |
5768421 | Gaffin et al. | Jun 1998 | A |
5796611 | Ochiai et al. | Aug 1998 | A |
5860066 | Rouse | Jan 1999 | A |
5912980 | Hunke | Jun 1999 | A |
5978804 | Dietzman | Nov 1999 | A |
5999639 | Rogers et al. | Dec 1999 | A |
6069696 | McQueen et al. | May 2000 | A |
6192360 | Dumais et al. | Feb 2001 | B1 |
6205236 | Rogers et al. | Mar 2001 | B1 |
6240424 | Hirata | May 2001 | B1 |
6269358 | Hirata | Jul 2001 | B1 |
6327581 | Platt | Dec 2001 | B1 |
6360139 | Jacobs | Mar 2002 | B1 |
6445834 | Rising, III | Sep 2002 | B1 |
6449384 | Laumeyer et al. | Sep 2002 | B2 |
6526158 | Goldberg | Feb 2003 | B1 |
6625317 | Gaffin et al. | Sep 2003 | B1 |
6889163 | Hashitani et al. | May 2005 | B2 |
6978226 | Kanehira et al. | Dec 2005 | B2 |
7065232 | Geng | Jun 2006 | B2 |
7068309 | Toyama et al. | Jun 2006 | B2 |
7069259 | Horvitz et al. | Jun 2006 | B2 |
7110880 | Breed et al. | Sep 2006 | B2 |
7231061 | Bradley | Jun 2007 | B2 |
7333963 | Widrow et al. | Feb 2008 | B2 |
7379627 | Li et al. | May 2008 | B2 |
7383282 | Whitehead et al. | Jun 2008 | B2 |
7418444 | Flank et al. | Aug 2008 | B2 |
7479969 | Behiels | Jan 2009 | B2 |
7525484 | Dupray et al. | Apr 2009 | B2 |
7560720 | Voigt et al. | Jul 2009 | B2 |
7650319 | Hoffberg et al. | Jan 2010 | B2 |
7663671 | Gallagher et al. | Feb 2010 | B2 |
7668346 | Xiao et al. | Feb 2010 | B2 |
7688996 | Bradley | Mar 2010 | B2 |
7693817 | Dumais et al. | Apr 2010 | B2 |
7734087 | Hwang et al. | Jun 2010 | B2 |
7876934 | Georgescu et al. | Jan 2011 | B2 |
7912278 | Fung et al. | Mar 2011 | B2 |
7958063 | Long et al. | Jun 2011 | B2 |
20010026631 | Slocum et al. | Oct 2001 | A1 |
20020049606 | Dan et al. | Apr 2002 | A1 |
20020059221 | Whitehead et al. | May 2002 | A1 |
20020072878 | Kanehira et al. | Jun 2002 | A1 |
20020157095 | Masumitsu et al. | Oct 2002 | A1 |
20020174120 | Zhang et al. | Nov 2002 | A1 |
20030086627 | Berriss et al. | May 2003 | A1 |
20030222977 | Yoshino | Dec 2003 | A1 |
20040002932 | Horvitz et al. | Jan 2004 | A1 |
20040025180 | Begeja et al. | Feb 2004 | A1 |
20040039529 | Hashitani et al. | Feb 2004 | A1 |
20040044952 | Jiang et al. | Mar 2004 | A1 |
20040090439 | Dillner | May 2004 | A1 |
20040098362 | Gargi | May 2004 | A1 |
20040107181 | Rodden | Jun 2004 | A1 |
20040175041 | Miller | Sep 2004 | A1 |
20040212695 | Stavely et al. | Oct 2004 | A1 |
20040218788 | Geng | Nov 2004 | A1 |
20040218827 | Cohen et al. | Nov 2004 | A1 |
20050084154 | Li et al. | Apr 2005 | A1 |
20050123202 | Hwang et al. | Jun 2005 | A1 |
20050131660 | Yadegar et al. | Jun 2005 | A1 |
20060004711 | Naam | Jan 2006 | A1 |
20060047419 | Diendorf et al. | Mar 2006 | A1 |
20060195440 | Burges et al. | Aug 2006 | A1 |
20060253258 | Miyake | Nov 2006 | A1 |
20060265661 | Ball | Nov 2006 | A1 |
20060274145 | Reiner | Dec 2006 | A1 |
20070016553 | Dumais et al. | Jan 2007 | A1 |
20070061023 | Hoffberg et al. | Mar 2007 | A1 |
20070083507 | Bowman et al. | Apr 2007 | A1 |
20070115373 | Gallagher et al. | May 2007 | A1 |
20070120844 | Beikirch et al. | May 2007 | A1 |
20070122031 | Berriss et al. | May 2007 | A1 |
20070203942 | Hua et al. | Aug 2007 | A1 |
20070239610 | Lemelson | Oct 2007 | A1 |
20070286455 | Bradley | Dec 2007 | A1 |
20070288432 | Weltman et al. | Dec 2007 | A1 |
20070288462 | Fischer et al. | Dec 2007 | A1 |
20080077570 | Tang et al. | Mar 2008 | A1 |
20080082426 | Gokturk et al. | Apr 2008 | A1 |
20080144068 | Digby | Jun 2008 | A1 |
20080159627 | Sengamedu | Jul 2008 | A1 |
20080208922 | Wolas-Shiva et al. | Aug 2008 | A1 |
20080281915 | Elad et al. | Nov 2008 | A1 |
20090006285 | Meek et al. | Jan 2009 | A1 |
20090024579 | Obrador | Jan 2009 | A1 |
20100329529 | Feldman et al. | Dec 2010 | A1 |
Entry |
---|
Kiyoki, Y., Kitagawa, T., and Hayama, T. 1994. A metadatabase system for semantic image search by a mathematical model of meaning. SIGMOD Rec. 23, 4 (Dec. 1994), 34-41. DOI= http://doi.acm.org/10.1145/190627.190639. |
Davis, M., King, S., Good, N., and Serves, R. 2004. From context to content: leveraging context to infer media metadata. In Proceedings of the 12th Annual ACM international Conference on Multimedia (New York, NY, USA, Oct. 10-16, 2004). Multimedia '04. ACM, New York, NY, 188-195. DOI= http://doi.acm.org/10.1145/1027527.1027572. |
Number | Date | Country | |
---|---|---|---|
20090132467 A1 | May 2009 | US |