The present invention relates generally to systems and methods for searching data files and, more particularly, to systems and associated devices, methods, and computer program products for performing metadata-based searches of data files using a small display and a limited user interface.
Mobile devices, including mobile communication devices and terminals, and wireless communication technologies have advanced significantly over the recent few decades. In keeping stride with the advancement and impact of mobile devices, new wireless systems, devices, protocols, and services are developed and introduced to further the use of these technologies, and consumers continue to demand even more advanced wireless functionality and capabilities. Such technologies far surpass simply allowing voice communications and include, for example, text messaging, multimedia messaging and communications, e-mail, Internet browsing, music, and access to a wide range of wireless applications and services. Recent systems, including third generation (3G) systems, such as those specified for use with the Global System for Mobile communications (GSM) wireless standard, enable the delivery of new digital services not previously available or prohibitively inefficient in earlier second generation (2G) wireless networks.
These improved technologies also present the need for increased processor capacity and storage for the increasing amounts of data that may now be transmitted to a mobile device. Mobile devices have improved as storage devices, and mobile devices now provide increased storage capacity for data files such as email, email attachments, web pages, images, music, and other files such as multimedia files which can be transmitted on 3G systems. Improved storage devices and increased storage capacities also result in increasing numbers of data files. As the number of data files on a mobile device increase, accessing a particular data file or group of data files becomes increasingly difficult, less efficient, and, eventually, prohibitive for effective use of the mobile device as a storage medium.
This problem is particularly relevant for devices with small display screens such as mobile telephones, MP3 players, personal digital assistants, and devices that represent a combination of these and other personal and wireless technologies. Because of growing storage capacities, even devices with small screens that traditionally had limited storage capacities can, nevertheless, contain large quantities of data and numbers of data files, such as media files or media items. In addition to the problem of small displays, physical and software user interfaces are typically limited on many personal and wireless devices. If the physical user interface is not going to change, the manner of using the device and associated software user interface can be changed to improve the ability to manage increasing amounts of stored data.
Example previous and existing file management systems have relied upon folder-based management with searching or filtering based upon data file characteristics such as name, date, type, and size. Other approaches have taken advantage of searching metadata fields of data files. But these file management systems rely primarily upon presenting search results in a flat list such as the flat list of results shown in
In light of the foregoing background, embodiments of the present invention provide an improved system and associated terminal, method, and computer program product for performing metadata-based searches of data files using a small display and a limited user interface.
Embodiments of the present invention improve upon existing data search and management systems by displaying the initial results of a search, or filter, as clusters depending upon the type of matches or filtered items that result. Using clusters provides an intuitive way of displaying results on a compact device with a small screen and limited user interface. Embodiments of the present invention also provide a user the ability to refine a search, the results of a search, and the clusters that are produced to organize and present the results.
Embodiments of the present invention provide a system, associated device or mobile terminal, method, and computer program product for performing metadata-based searches. An embodiment of a system of the present invention may include a memory for storing data files with associated metadata and a processor for searching the metadata files, producing results with hits associated with the search, clustering the results based upon the metadata, and displaying the clustered results. The results and clustered results can be refined by further searching of the data files or the results and clustered results. An embodiment of a system of the present invention may also include an input device and a display. The input device may be used to enter search criteria such as characters, character strings, and search operators.
An embodiment of a mobile terminal of the present invention may include a memory and processor similar to the previously described system embodiment and may also include a search application for operating on the processor based upon character input from an input device and providing the results and clustered results to a display of the mobile terminal.
An embodiment of a method of the present invention may include the steps of searching metadata associated with data files based upon search criteria character input to produce search results and clustering the search results to produce result clusters. The character input may be a single character, a character string, combinations of characters and character strings, or multiple character strings. Characters and character strings are defined herein to include any character, symbol, or other representative unit, including spaces, such as characters typically found on a keyboard or alphanumeric keypad or identified as ASCII, but is not limited to alphanumeric characters. A character string may be a single character if the character string is a separate search criteria from another character or character string, such as “n” and “jun” in
In an embodiment of a method of the present invention where results are clustered, the clustering may be performed based upon a predetermined display capability such as the ability to show a particular number of results or result clusters on a display device. The clustering of results may alternatively or additionally be based upon the metadata of the search results. The metadata may provide clustering based upon, for example, a period of time, an event, or a topic of the metadata or based upon a physical location of data files in the search results. Clustering may also or alternatively be based upon a predetermined maximum result cluster size, meaning the number of results that are included in a result cluster or a maximum percentage or a combination of percentage or number of results that are included in a result cluster. Clustering may also or alternatively be based upon relevancy to search criteria, such as the number of hits of search criteria in each of the search results. In addition to clustering, results and result clusters may be ordered for display such as by relevancy to search criteria, alphabetical listing, a period of time, a date, time stamp, a sender, a creator, an owner, an event, a topic, a location, or the relevant weight of a data file or a number of data files.
An embodiment of a computer program product of the present invention is also provided that includes metadata searching for results using result clusters to organize the search results that would otherwise be displayed in a flat result list. As described with reference to embodiments of methods of the present invention, embodiments of computer program products of the present invention include the ability to accept additional search criteria character input to refine search results and result clusters that are displayed. These characteristics, as well as additional details, of the present invention are further described herein with reference to these and other embodiments.
Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
The present inventions now will be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, these inventions may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout.
While a primary use of the present invention may be in the field of mobile phone technology, it will be appreciated from the following description that the invention is also useful for many types of devices that are generally referenced herein as mobile terminals, including, for example, handheld data terminals and personal data assistants, portable medical devices, personal multimedia units such as video or audio players (e.g., MP3 players), handheld PC devices, digital cameras, digital camcorders, portable TV devices, computer watches, and other portable electronics, including devices that are combinations of the aforementioned devices. Similarly, one of ordinary skill in the art will recognize that, while the present invention is particularly useful for devices with small screens and limited physical user interfaces, the present invention can be used for searching or managing data files on other devices and systems.
One aspect of the present invention is the use of metadata in searches to determine or partially determine search results. As used herein, metadata means both the traditional data identified as metadata fields in data files such as description field and tagged information and also traditional non-metadata field information such as a file name, time stamp, and other information related to the data file, including a log of actions that are related to or that have been applied to the file, whether included as part of the data file or stored separately such as in an external database. Metadata is intended to mean the collective total of all information associated with the data file. For example, an image data file would include an image but the metadata related to the file would also include all other information related to the image, including metadata stored in the data file such as EXIF or IPTC fields inside a JPEG image file. Further, metadata may include text include in a data file such as a multimedia message where a sender provides text explaining where and attached image was captured; similar location metadata may be stored in metadata fields associated with the attached image data file. By way of further explanation and example, an image data file may have associated metadata including a file name, file size, date and time, description, image resolution and size, image type, and copyright information separate and apart from the image itself.
A second aspect of the present invention is the use of clustering of search results to further refine and improve the efficiency of searching and the provisioning of the results of the search, particularly on devices with small displays and limited user interfaces. With respect to the metadata-based searching aspect of embodiments of the present invention,
In addition to textual metadata search criteria, characters and character strings may represent any type of metadata associated with data files. Search criteria may be defined by character input such as characters, character strings, and operators to search any type of metadata field such as a textural description, keywords, location, periods of time, dates, time, sender, creator, owner, events, topics, and weights of activity of data files. For example,
As previously mentioned, search criteria may be a single character as represented in
Search results may be organized or clustered into result clusters, as described further herein. A user may select or open a result cluster to display the data files or results which have been organized or grouped into the result cluster. For example,
One of ordinary skill in the art will identify that search criteria may include search operators such as a space character that indicates an AND or an OR operator between search criteria characters and/or character strings. Any search operator may be used. For example, AND, +, OR, NOT, −, LIKE, ˜, PRE/n, W/n, W/M, ATLEAST/n/X, “,”, “;”, !, *, #, and ?. As for any operator, any search operation may be assigned or configured. However, by way of example, the following example operators are further defined with example meanings: + may mean inclusive similar to AND, − may be exclusive similar to NOT, LIKE and ˜ may identify synonyms or similar character strings, PRE/n may be used where a first search criteria character string proceeds a second search criteria character string by not more than n characters or character strings, W/n may be used where search terms are within n characters or character strings of each other, W/M may be used where search terms are in the same metadata field, ATLEAST/n/X may be used where search term X appears at least n times in the metadata of a data file, a “,” (comma) or a “;” may be used as an AND or an OR operator, ! may mean an exact match, * may represent a placeholder for any single or multiple character beginning or ending or may be used with a root word plus all the words made by adding letters to the beginning or end, # may represent a placeholder for any single character, and, as with any other symbol or operator, ? may be assigned to as any desired operator.
A search may be configured in a routine or by operation of a user to search all of the characters and/or character strings throughout all of the metadata fields of associated data files. Alternatively, search criteria may be separated by operators such as space characters or commas to search or focus on different metadata fields or alternatively this function may be defined in a general options. For example, a first search criteria keyword may be searched through all of the metadata fields for occurrences, and a second keyword may be searched through all of the remaining metadata fields in which the first search criteria keyword did not appear. Effectively, each search criteria may identify a particular metadata field or group of metadata fields. For example, a user may want to identify data files with associated metadata where two search terms appear in two different metadata fields. A user can enter two search terms, and if this type of search is being used, the search routine will only identify data files where the two search terms appear in two different metadata fields of a data file. In such a manner, a user may be able to aggressively refine the search results in result clusters in order to quickly identify a particular data file or group of data files. In one alternate embodiment of the present invention, a search may also be restricted to contain only a certain file type, selected file types, or all file types, where types may be defined by a particular category such as images or music, a particular file extension such as .gif or .jpg, or any other file type scheme. One of ordinary skill in the art will recognize that file types may be used to cluster and/or order results of a search, as described further herein.
One of ordinary skill in the art will recognize that embodiments of the present invention may search a composite data file as a single file and as separate files, only as a single file, or only as separate files. Some media files can be combined together into composite productions, or more general media items. For example, a composite media file such as a MMS message may include text, video, audio, images, web pages, other media files, and combinations of these and other media files. A search may identify matches within the composite media file and/or the separate files and include as results of the search the composite media file and separate files, only as the single composite media file, or only as the separate files. If only the single composite media file will be represented as a result, an embodiment of the present invention may search the composite media item as if it has inherited the metadata of the data files composed therein. One of ordinary skill in the art will also recognize that certain combinations of media items may be interpreted as metadata of other media items. For example, a composite media item of text, such as a greeting message, and an image, such as a holiday picture, may be interpreted so that the text part of the media item is inherited or annotated as metadata of the image.
As previously described, matches or occurrences in metadata fields of search criteria may be displayed as hits associated with results and result clusters in result lists and result cluster lists. The number of data files that match a particular selection of a result cluster or search criteria, may be displayed for each result cluster and/or each list. For example, the number of identified result clusters may be indicated and the number of results in a result cluster may be indicated. Further, thumbnails or icons may be used to further identify the results and result clusters in a list. The example embodiments of
In addition to searching by metadata associated with data files, embodiments of the present invention provide for grouping or ordering of search results into result clusters, or clustering. For example, a multi-dimensional metadata-structure or “categorization scheme” related to all of the matched data files may be analyzed by a clustering routine to determine alternative views or clusters to present the resulting data files from a metadata search. The multi-dimensional metadata-structure relates to the plurality of metadata fields which may be identified as having matches or occurrences of search criteria. This plurality of metadata fields forms a multi-dimensional metadata-structure. The resulting data files may be prioritized or grouped based upon occurrences in similar metadata fields or related multi-dimensional metadata-structure of the resulting data files. A view may be, for example, combinations of time, date, and periods of time parameters into a single cluster. Similarly, event and topic metadata field occurrences may be clustered. However, one of ordinary skill in the art will recognize that clusters may be organized to represent resulting data files based upon any number of characteristics and metadata fields. Additional example metadata categories or clusters may include metadata fields or combinations of metadata fields such as time, location, events, topics, time-location, time-event, time-topic, event-location, and relevancy to search criteria. For example, if a search string of “August” matches data files from both August 2002 and August 2003, the data file matches may be clustered into two groups representative of their files for August 2002 and data files for August 2003. Search results may also be clustered based upon metadata that may not be part of the search criteria or not found but is otherwise represented in the resulting files. For example, the results from a search string of “August” may be clustered based upon location metadata information such as “summer cottage,” “office,” and “home.” One of ordinary skill in the art will recognize the various alternative clustering methods may also be utilized for grouping search results of embodiments of the present invention. Further, clusters may be identified or named using search criteria keywords or occurrences of search criteria in metadata fields.
A clustering routine may be configured to organize a limited number of clusters that contain only a representative selection of data files including a high density of search criteria among all of the results from a search, discarding less relevant data files or data files with low density of search criteria in the data file metadata. Clustering of results may also be based upon the search criteria that is entered. For example, result clusters may be organized to show all of the results from one search criteria character string in a single result cluster and all of the result clusters from a second search criteria character string in a second result cluster. A third result cluster may be representative of all of the data files with occurrences of both character strings.
Embodiments of the present invention may also include result clusters that contain extended search results and/or hits, meaning results and/or hits that do not strictly meet the search criteria but have similar metadata as the actual search results and/or hits. Actual search results and/or hits means search results and/or hits representative of exact matches to the search criteria. Extended search results and/or hits may be clustered to be presented separately from actual search results and/or hits or sorted to be secondary to actual search results and/or hits. Searching for results and/or hits similar to but not exact matches to the search criteria may be useful when metadata may be incomplete. For example, if a user searches a specific period or moment in time, such as a day, and a keyword, such as “in-laws,” to find a picture of the new in-laws of the user taken on the wedding day of the user, the matched results could include a first data file picture having both search criteria, a second data file being the previous picture captured as indicated by time metadata, and a third data file being the next picture captured as indicated by time metadata. Similarly, clusters may be formed of data files matching a period of time, or other characteristic, closely related to the search criteria such as data file captured one hour before or one hour after a searched hour. Another example may be where a first cluster includes exact matches to the search criteria and a second cluster includes similar but not exact matches such as the day of the search criteria but not the time of the search criteria. A further example, may be where the search routine has been configured to look for synonyms and plurals of search criteria such as where a search for brown also results in matches of tan and beige.
Clustering criteria may also or alternatively include considerations of limitations of a display such as the size or resolution of the display to limit the number of result clusters to the number capable of being displayed or a number of result clusters not significantly greater than the number of result clusters that may be displayed, such as twice or three times the display capability. For example, a user interface of a mobile device may be physically small in size, have a low resolution, or a combination of both such that the display may also present a limited amount of information without requiring scrolling. Depending upon the size or resolution of a display, a search routine or a clustering routine may be configured to provide for example two to six data files or result clusters. Thumbnails, icons, and text or metadata information such as occurrences of search criteria in the metadata may be presented with the limited number of results or result clusters. Parameters of a thumbnail image or icon may further limit the amount of information that may be displayed on a mobile device. All of this information may be taken into consideration by a search routine or a clustering routine to provide result lists and result cluster lists that are easily navigable by a user and present information in an efficient manner. For example, the embodiments of
Once result clusters have been determined and populated with search results, the result clusters may be displayed to a user indicating various characteristics of the result cluster such as the number of results in the cluster, hits of search criteria in the metadata, and representative thumbnails or icons of results in the result cluster. Additionally, icons describing different metadata fields may be displayed. As previously described, a user may then select a result cluster to review the results in the cluster, further search the result clusters, or refine the entire search while preserving the identified result clusters and results.
If large numbers of result data files or result clusters are organized, embodiments of the present invention may perform multilevel clustering.
One of ordinary skill in the art will recognize that the result clusters and data file results of result clusters may be organized or sorted according to different criteria. For example, the alphabetical name of the result data files or the level provided for a result cluster may be used to organize the list. Alternative organization criteria may include such characteristics as the relevancy to search criteria, periods of time, dates, times, sender, creator, owner, events, topics, locations, and weighted activity of data files. Any number of metadata fields or characteristics of data files may be used to order results and result clusters. The weighted activity of a data file is intended to describe quantity characteristics of a data file such as the number of times a data file has been accessed, displayed, emailed or sent, printed, or edited. Relevancy to search criteria may include any number of characteristics such as the quantity or number of occurrences of search criteria in metadata fields, the number of metadata fields in which the search criteria occurs, and various other traditional search relevancy determinations.
One of ordinary skill in the art will recognize that text on nine keys (T9) can be used to improve the efficiency of searching of embodiments of the present invention. T9 is a system that allows users to enter words and phrases by pressing a number key for each letter in the word or phrase. The system is similar to entering the letters of a name when looking up someone in a company's phone directory over a phone. T9 was developed as a faster alternative to multi-tapping, a text input system that requires the user to tap a key from two to four times to select many letters. In contrast, T9 uses predictive text input and predictive software to enable identification of a letter with a single key tap. T9 compares the single key tap input to a list of possible words in dictionary. According to embodiments of the present invention, a specific dictionary containing all of the words that occur in metadata of the data files on a storage device, or metadata that appears in search results, may be compiled for use with a T9 system. If the T9 system is not able to take advantage of or is not able to identify hits or words within this limited metadata dictionary, the T9 system may fall back to a standard or default dictionary. When using a T9 system, a separate dialogue or input box may be opened and used on the display of a mobile device for inputting a character string such as a word or phrase and/or search operators using the T9 system, and then, after the T9 system has been used to input search criteria, that inputted search criteria may be selected as search criteria for the metadata-based search. One of ordinary skill in the art will also recognize that various other functionalities for searching may be used such as speech recognition. Additional technologies may also utilize a compiled metadata dictionary such as to improve the recognition accuracy of speech recognition.
One of ordinary skill in the art will also recognize that embodiments of the present invention may be used for other applications apart from mobile devices such as searching on the Internet. The dynamic searching and clustering and refinement of searching and clustering may be used for other applications such as messaging or email and media such as music and video files. Embodiments of the present invention provide additional features for various management applications of multiple data files. For example, in a messaging or email system, messages may be searched based upon the various characteristics or metadata of the emails such as title or subject, sending, recipient, date, and message body to search and cluster the messages that would be otherwise searched and provided in a resulting flat list of messages. In general, the present invention may be utilized to manage and search any type or kind of file.
One of ordinary skill in the art will also recognize that the present invention may be incorporated into software systems and subsystems, as well as various other applications. In each of these systems as well as other systems, including dedicated systems, capable of hosting the system and method of the present invention as described above, the system generally can include a computer system including one or more processors that are capable of operating under software control to provide the metadata-based searching and clustering techniques described above.
As shown in
It will be understood that each block, or step, or element of the flowchart of
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.