The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which:
Embodiments of the present invention assist in the generation of hierarchical semantic databases to augment multimedia data collections and their associated limited semantic tags by automatically determining categories for named entities. In some applications such as personal digital image or video collections, named entities (e.g., John, Berlin, Peter's 21st birthday party) constitute on average more than two thirds of the succinct tags entered by the user to annotate individual items or portions of the user's collection. This is a natural confirmation of the fact that a typical digital multimedia collection is personal, hence the emphasis is on individual-specific semantic content (e.g., family, friends, vacations, events, etc.). Therefore, a solution to the named entity recognition problem is very useful for personal multimedia databases.
Embodiments of the present invention comprise a method for automatic grouping of the named entities present in personal multimedia databases into a set of basic ontologies covering general, universally acceptable categories, such as people, places, and events. An ontology is the hierarchical structuring of knowledge about things by subcategorizing them according to their essential (or at least relevant and/or cognitive) qualities. The present approach is based on a fusion of semantic clues obtained from multiple heterogeneous online and offline reference resources, given a named entity as an input parameter, to automatically determine the likelihood that the named entity being processed belongs to a particular category. In one embodiment, information from on-line reference resources may be cached locally on the user's processing system to achieve real-time performance without loss of accuracy. Supervised machine learning methods may be used to design a set of classifiers for named entities and to fuse them together to determine the general category for the named entity being processed. In one embodiment, an interactive learning algorithm may then be applied that will allow the user to extend, modify, and adjust the automatically generated categories.
Reference in the specification to “one embodiment” or “an embodiment” of the present invention means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
When used in conjunction with a personal multimedia application (used to store, retrieve, and render multimedia data), the entering of the phrase by the user (or extracting tags or other text associated with the data) may be a direction to the application to find all multimedia data in a user's collection that is associated with the input text. By determining which category the input text relates to, the application may be able to more quickly and accurately find relevant multimedia data items (e.g., images, videos, songs, other sound files, etc.) in the collection for the user.
At block 302, one or more queries may be generated based on the input text (i.e., based on the head noun in one embodiment). The queries may be generated to conform to a known syntax for queries to a particular reference resource, whether online or offline. For example, a query may be in hyper text transport protocol (HTTP) format for making a query to a website. In one embodiment, many queries may be generated, with each query being sent to a specific web site.
At block 304, the queries may be submitted to a plurality of online and/or offline heterogeneous reference resources. A reference resource comprises a website, database, application program, or other information repository that can accept a query for information and return an appropriate response. In one embodiment, many heterogeneous reference resources may be used, such as a publicly available semantic lexicon application program called “WordNet” (publicly available from Princeton University) which may be stored offline (i.e., locally available), a computerized dictionary, almanac, gazette/gazetteer, or name database, and online web sites such as “Behind the Name,” “Answers,” and “World Gazetteer.” Many other reference resources, both online and offline, may be used. In one embodiment, the reference resource may be cached locally to provide for fast access.
At block 306, the responses to the queries may be received, and a vector may be generated based at least in part on the responses. The textual responses may be converted to a vector of multiple numbers. The resulting vector is a numeric representation of the query results.
At block 308, classification may be performed based at least in part on the vector of numbers generated at block 306, and a set of model parameters to produce a category decision. The model parameters comprise support vectors and associated weights. The classifier may be represented by several sets of weights (one per category), and the predictive estimate for a given cateory is computed as a linear combination of the vector representation of the query response and classifier weights. The model parameters may be used by the classifier to make a category decision. The model parameters may be set up during a training phase for the classifier. The NER system may use sample queries to the user to adjust the model parameters. In one embodiment, the classifier comprises a known support vector machine-based classifier that takes a linear combination of the vector quantities constructed at block 306 and the model parameters to produce a positive or negative number indicating the likelihood that the input text matches a specific category (i.e., people, place, event, etc.). In one embodiment, there may be a separate classifier for each category. In another embodiment, the classifier may be configured to perform multiple classification. Each category decision may be displayed to the user, used to search the personal multimedia collection, or for other purposes.
At block 310, user feedback may be accepted to update the model parameters in a feedback/adaptation loop. For example, during a training phase or thereafter, a user may assert that a query belongs to a certain category. Updating the model parameters may result in better classification decisions.
Named entity recognition is usually considered as a problem of determining the semantic label of a particular word representing a named entity in the presence of some other words or context. Prior art solutions rely heavily on such contextual features as punctuation, properties of the words that precede and/or follow the word in question, parsed syntactic information from the whole sentence, etc. However, in personal image and video database indexing, classification and retrieval, the above context information is largely unavailable due to the sparse and succinct nature of supplied annotation.
Embodiments of the present invention recognize this fact and strive to focus primarily on the word (i.e., head noun) itself instead of its context. Context independence is necessary for usage scenarios having sparse annotation and possibly real-time input typed by a user, such as in a personal multimedia collection application. In this scenario, embodiments of the present invention go beyond a straightforward choice of dictionary-based processing by aggregating information synchronously and asynchronously from diverse information sources and using different processing techniques. In at least one embodiment, exact lexical matching may be combined with approximate similarity models (e.g., Levenshtein distance) applied to the data gathered from heterogeneous sources such as dictionaries, gazetteers and semantic lexicons. Subsequently, such data is processed with a supervised machine learning technique which allows the user to extend, adapt and modify the semantics of the personalized annotation tags of items in a personal multimedia collection and the structure of relationships among them. The latter represents a personalized semantic hierarchy of named entities that may be coupled with other known content-based retrieval methods to provide a more intelligent and natural way to organize, access and interact with personal digital media collections. Embodiments of the present invention may be used for extensible named entity hierarchy processing for enabling real-time multimedia mining applications for personal multimedia databases.
Although the operations described herein may be described as a sequential process, some of the operations may in fact be performed in parallel or concurrently. In addition, in some embodiments the order of the operations may be rearranged.
The techniques described herein for the named entity recognition system and personal multimedia application are not limited to any particular hardware or software configuration; they may find applicability in any computing or processing environment. The techniques may be implemented in hardware, software, or a combination of the two. The techniques may be implemented in programs executing on programmable machines such as mobile or stationary computers, personal digital assistants, set top boxes, cellular telephones and pagers, and other electronic devices, that each include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices. Program code is applied to the data entered using the input device to perform the functions described and to generate output information. The output information may be applied to one or more output devices. One of ordinary skill in the art may appreciate that the invention can be practiced with various computer system configurations, including multiprocessor systems, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing environments where tasks may be performed by remote processing devices that are linked through a communications network.
Each program may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. However, programs may be implemented in assembly or machine language, if desired. In any case, the language may be compiled or interpreted.
Program instructions may be used to cause a general-purpose or special-purpose processing system that is programmed with the instructions to perform the operations described herein. Alternatively, the operations may be performed by specific hardware components that contain hardwired logic for performing the operations, or by any combination of programmed computer components and custom hardware components. The methods described herein may be provided as a computer program product that may include a tangible machine accessible medium having stored thereon instructions that may be used to program a processing system or other electronic device to perform the methods. The term “machine accessible medium” used herein shall include any medium that is capable of storing or encoding a sequence of instructions for execution by a machine and that cause the machine to perform any one of the methods described herein. The term “machine accessible medium” shall accordingly include, but not be limited to, solid-state memories, optical and magnetic disks, and a carrier wave that encodes a data signal. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, process, application, module, logic, and so on) as taking an action or causing a result. Such expressions are merely a shorthand way of stating the execution of the software by a processing system cause the processor to perform an action of produce a result.