1. Technical Field
The present disclosure relates a method and system for generating concept-specific data representations for multi-concept detection, and more particularly, to a system and method which employs more than one data representation in concept detection.
2. Description of the Related Art
Data management requires the generation of meta-data for facilitating efficient indexing, filtering and searching capabilities. It is often necessary to develop tools that allow users to associate concepts with data. However, the abundance of data and diversity of concepts makes this a difficult and overly expensive task. In particular, the task of detecting the concept using the appropriate set of one or more data representations is extremely important.
Given that data management and data management systems are essential in virtually every industry, concept detection is becoming more important in data management applications. Learning and classification techniques are increasingly relevant to state-of-the art data management systems. From relevance feedback to statistical semantic modeling, there has been a shift in the amount of manual supervision needed, from lightweight classifiers to heavyweight classifiers.
It is therefore a consequence that machine learning and classification techniques make an increasing impression on the state of the art in data management. Techniques that use data representations for concept detection include, for example, Naphade et al. (Naphade et al., “A Framework for Moderate Vocabulary Semantic Visual Concept Detection”, IEEE International Conference on Multimedia and Expo 2003). Similar techniques exist for detection of concepts from text, media, etc.
One important issue includes the type of representation used for detection of information in data. In some cases, the representation may include all the data (an image, a video, a text document, etc.) or part of the data (a region in an image, a paragraph in a document, etc.). In many cases, a fixed set of multiple representations is used. Prominent among these are the multi-scale techniques that use wavelet-based processing for detection as in Koller et al. (T. Koller et al., “Multiscale detection of curvilinear structures in 2-D and 3-D image data”, 5th International Conference on Computer Vision, June 1995.
Multi-scale techniques are one instance of how multiple representations can be developed. However, in conventional techniques, the procedure that creates the representation is not determined based on a set of concepts, which are to be detected in the representation. Instead, the content is merely searched for in a given concept without adapting to the type of concept being searched.
A system and method for detecting a concept from digital content are provided. A plurality of representations is generated for same data content for concept detection from the plurality of representations. A plurality of concepts is simultaneously detected from the plurality of representations of the same data content wherein at least one detector provides selection information for selecting the representations generated or a combination of the generated representations. This results in multiple instances of a representation being considered for concept detection.
A method for detecting a concept from digital content, includes providing digital content, representing the digital content in a plurality of representations, generating a set of regions for each of the plurality of representations for the same data content, simultaneously detecting a plurality of concepts from the regions, scoring each region based on confidence that the concepts exist in each region and processing region scores.
A system for detecting a concept from digital content includes a representation generation module, which represents digital content in a plurality of representations by generating a set of regions for each of the plurality of representations for the same data content. At least one concept detector simultaneously detects a plurality of concepts from the regions by comparing data in the region to concept models and scoring each region based on confidence that the concept exists in that region.
These and other objects, features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
A method and system for generating concept-specific data representations for multi-concept detection are provided. The method and system generate one or more representations, and the generation process is decided jointly by all the concepts in the list. This may include combining one or more representations, which are segmented using different techniques to make the combined representation suitable for improved concept detection. One aspect of the present disclosure is to avoid using the same fixed data representation for all concept detection purposes.
Instead, the present embodiments consider one or more alternative data representations and generate one final concept-specific data representation for detection purposes, where the final representation generation process is determined based upon a given set of concepts that need to be detected.
The present illustrative embodiments are applicable to all forms of data including multimedia data, text, rich media, hypertext, documents, etc. If the concept detection process needs a priori creation of concept models, a first procedure of representation generation for the purposes of concept model creation need not be the same as a second procedure of representation generation that is used for concept detection. Representation generation is a process or processes, which are employed to generate a collection of data, such as an image, an audio composition, etc. A concept model is a model used for comparison to identify a concept in given data.
The present illustrative embodiments do not require knowledge of the procedure for representation generation used for the creation of concept models. Instead, the present disclosure creates the final concept-specific and potentially data-redundant representation simultaneously based on all the concepts in a set.
One important concept is to avoid merely using the single given data representation for concept detection, especially where multiple concepts are listed in a set. Instead, one or more representations are generated jointly by all the concepts in the list, which need to be detected. For example, in multimedia annotation, the user is permitted to have a list of concepts such as “face”, “sky”, “car” and create concept-specific representations in terms of grids, layouts, segments of the multimedia content where the representations are created jointly based on the three concepts in the list. For example, since the concepts include a face, sky and car, the image will be segmented in a way that will permit the best chance of identifying these concepts in the image. This may include using semantic or relational information to isolate regions of the image. Illustratively, the sky is typically blue and may be found, usually at the top of the image. A car is often on a surface, such as an asphalt roadway and includes wheels. A face has determinable features, which can be relied upon to identify one in the image content.
It should be understood that the illustrative embodiments described herein are not limited to multimedia data alone and can be applied to all forms of data from which concepts need to be detected including text, rich media, hypertext, documents etc. In addition, these embodiments do not require that the procedure of representation generation that is used for concept detection be identical to the scheme of representation generation that is needed during the creation of the concept models used for detection. Advantageously, the illustrative embodiments do not need to know the procedure of representation generation used during the creation of the concept models used for detection.
It should be further understood that the elements shown in the FIGS. may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose digital computers having a processor and memory and input/output interfaces.
Referring now to the drawings in which like numerals represent the same or similar elements and initially to
Given a piece of content at a given modality and granularity, there are multiple representations of the same content at a finer granularity. For example, an image can be represented at a finer granularity as a set of image regions, and there are multiple sets of image regions that can represent the same image, as illustratively shown in
Referring to
The grid-based representation 102 is an example of a complete representation, or one where the set of finer-granularity content pieces (e.g., the image regions 104) cover the entire content piece at the coarser granularity (e.g., the whole image 100). The grid-based representation 102 is also a non-redundant representation, or one where the set of finer-granularity content pieces (e.g., the image regions 104) are mutually exclusive (e.g., do not overlap).
Referring to
When a content representation is complete and non-redundant, it is called a segmentation of the content. One example of segmentation for the image of
Referring to
Referring to
The given representation of the content may not be the most appropriate representation for the detection of some concepts, however. For example, many concepts are regional by nature and by definition may occupy only a portion of the provided content. In other words, a different portion or region in an image may have different significance based upon information in other regions of the image. These relationships may be dealt with by appropriately training the system using, for example, concept models to provide this information.
Examples of such concepts along with the associated content regions they occupy are illustratively shown in
Referring to
In some cases (e.g., for detection of regional concepts), the input content may need a different content representation (e.g., set of regions 504) than the given content representation (e.g., an image 100) to improve detection performance. This process, called a representation generation process, to improve a representation includes producing a representation at a finer content granularity than the given content granularity by module 502.
Examples of the representation generation process include but are not limited to grid-based representation generation (
Referring to
After tuning and optimization (adjustment) of the data representation provided by feedback path 603, concept detection is applied as before using the concept detection module(s) 402 or 506 to generate concept labels 404 and corresponding detection confidence scores 406 for the input content. Note that changes in the set of target concepts may adjust the manner and method of parameter adjustments and optimization. For example, eliminating “indoors” for the target concept list would enable the tuning module 602 to focus the concept search on the person's image rather than the entire image.
Also, note that the set of concepts is dealt with simultaneously, such that all concepts are defined and scored within the representation or representations at the same time. An example of how a preferred embodiment may work for the detection of a single concept “Face” is illustrated in
Referring to
Therefore, in accordance with the present disclosure, redundant content may be employed to find a single concept or a set of concepts, simultaneously. The content may be employed to find the concepts in representations by adjusting the parameters of the generation of representations to improve the likelihood of successful concept detection. Combinations or these abilities and features are also contemplated and are considered within the scope of the present invention.
Having described preferred embodiments of a system and method for generating concept-specific data representation for multi-concept detection (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope and spirit of the invention as outlined by the appended claims.