1. Field of the Invention
The invention disclosed and claimed herein generally pertains to a method for using data content disseminated by multiple channels, in order to improve the response to a specified request for information. More particularly, the invention pertains to a method of the above type wherein the multiple channels distribute data content supplied by different multimedia sources, such as Internet websites, television broadcasts, IPTV and wireless device communications. Even more particularly, the invention pertains to a method of the above type that is adapted to exploit complementary and correlated information provided by the multiple channels of distribution, in order to provide deeper insight into the underlying semantics of the data content, and also to provide more coherent information threads.
2. Description of the Related Art
Expansive video information dissemination, via multiple distribution sources, poses an increasingly greater challenge for intelligence analysts. This dissemination of information now includes global sources, such as foreign news broadcasts, and further includes distributed multi-source multimedia (image, video and audio), Internet websites, and wireless personal communications. Such enormous expansion in information dissemination provides a new and overwhelming challenge for efficient content understanding and indexing. Existing content analysis and search multimedia services are typically based on processing and analysis of textual features such as multimedia file names, textual captions, speech transcripts and associated tags. Organizations that perform these activities include, for example, Google and its associate YourTube, and Yahoo Video and its associates Flickr, Blinkx TV, and MySpace. This, of course, assumes the existence of tags. Various speech recognition and machine translation techniques are used to enhance the existing textual features. However, such dependence on text makes the content, understanding, and search of multimedia data unreliable, when dealing with content from sources without adequate textual information, or with foreign sources.
At present, solutions to multimedia indexing mainly analyze a single source or instance of the provided data content, or deal with only a single channel of distribution that provides one snapshot into the semantics of the content. Traditional text-based indexing of multimedia content is generally not appropriate for multimedia content, where content description can have different meanings, or where text indexing does not describe digital content sufficiently well. Text-based indexing is also unreliable, when dealing with content from foreign sources or sources without adequate textual information. Topic threading summarization and linking research that relies on textual features, from news wires, speech recognition or machine translation transcripts of news video, is discussed in the DARPA Translingual Information Detection Extraction and Summarization (TIDES) program for “Topic Detection and Tracking” (TDT). Existing web summarization and linking services, e.g. Google news and Blinkx TV, are of narrow scope and typically based on text, file names or closed captions. Sun, J., Wang, X., Shen, D., Zhen, H., and Chen, Z., “Mining clickthrough data for collaborative web search,” International Conference on World Wide Web (WWW), May 2006, discusses web search performance by exploring group behavior patterns of search activities based on the click through data. However, while there has been research behind efforts such as mining web-blog patterns and mining web tags to extract relevant annotations, very useful or rich information contained in the visual and temporal dimensions of multimedia content has been largely ignored.
On the content exploration side, current mining methods rely on deriving associations only within one domain, and thus likewise have a very narrow scope. Associations in video domains are discussed by X. Zhu, X. Wu, A. K. Elmagarmid, and Z. Feng, L. Wu in “Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 665-677, May 2005, and by Kender and Naphade, in “Visual concepts for news story tracking: Analyzing and exploiting the NIST TRECVID video annotation experiment,” IEEE Computer Vision and Pattern Recognition, pp. 1174-1181, 2005. Tesic and Smith, in “Semantic Labeling of Multimedia Content Clusters,” IEEE Intl. Conf. on Multimedia and Expo (ICME), 2006, extend the scope of video summarization to allow users to more efficiently navigate the semantic and metadata space for the video data set. These references further show that current methods rely on mining information and deriving associations within only one multimedia domain, and are thus of very narrow scope. Little effort has previously been devoted to predicting important patterns in a new domain, or using patterns to extract threads or to label similar content across domains. This further emphasizes the conclusion that rich multimedia information over multiple sources has been largely ignored.
In view of the drawbacks described above, there is a growing need to both enrich semantic metadata for multimedia objects provided by multiple sources, and to support content analysis, understanding and search across multiple domains. In the absence of a means or method that addresses this problem, search is limited to a specific domain such as: (i) Keyword search for text; (ii) Context search over text based on keywords and ontologies/dictionaries; (iii) Video retrieval based on speech recognition, closed captioning, manual annotations and visual semantics within narrow scope; and (iv) Picture search based on tagging, file name, and camera metadata.
The invention provides an efficient and scalable solution for multimedia linking, in order to ensure more efficient multimedia data access. Embodiments of the invention exploit the complementary and correlated information that is available across multiple channels of digital multimedia distribution, in order to provide both deeper insights into the underlying semantics of the content and more coherent information threads over information channels. Embodiments of the invention can facilitate integrated search over text, video and pictures, by correlating the multiple channels of available information (horizontal-based search), and can also allow content resolution for thread extraction and for deeper understanding of a given context (vertical search space).
One embodiment of the invention, directed to a method for generating a response to a specified request for information, is associated with multiple channels that are each adapted to carry and disseminate data content. The method comprises extracting data elements from each of the channels, wherein each extracted data element pertains to at least one dimension of a plurality of correlation related dimensions. The method further comprises assigning each extracted data element to one of a plurality of correlation sets, wherein all the extracted data elements assigned to a particular set pertain to the same correlation related dimension, and at least one of the sets is assigned data elements extracted from two or more different channels. Two or more of the correlation sets associated with the request are then selected, and the data content thereof is used to generate the response to the specified request.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
Referring to
Each channel 102 originates with one of the sources in a set of distributed multimedia information sources (not shown). By way of example and not limitation, such sources could include Internet websites, amateur radio archives, broadcast news, libraries, newspapers, business and government archives, movies, television shows and information contained in scientific and medical databases. Thus, the information provided by multiple channels 102 comprises unstructured multimodal information, and can be in a variety of forms including, without limitation, text, audio, video, graphics and/or images. As an illustration, a particular image or video clip can be used in both an Internet website and a television broadcast. Information provided by the website in regard to the image or video clip can include web page text and metadata, such as alt tag, image name and URL. Television broadcasts may furnish information such as speech transcripts, and also the identities of television programs that displayed the image or video clip. It is anticipated that activities pertaining to the image or video clip, such as searching, analysis and indexing, can be significantly improved by using all of this information cumulatively.
As a first step in the procedure of responding to a request, system 100 may crawl or visit respective channels 102, and follow hyperlinks thereof, to select particular channels and related multimedia objects that are pertinent to information request 104. Referring further to
At function block 112, an element or artifact of data content in one of the channels 102 is compared with data elements in one or more of the other channels, in order to identify multimedia objects or data content in different channels 102 that is highly correlated. Usefully, correlation is implemented by content-based similarity identification or clustering of objects in different channels, or by near-duplicate detection of multimedia objects and text streams. In similarity detection, an effort is made to locate exact copies or very similar content of particular data content or multimedia objects in different channels, wherein the multiple channels collectively contain unstructured multimodal information content. Similarity detection can be used for data content or objects such as images, video, audio, text, and graphics content.
In embodiments of the invention, the correlation effort compares data elements such as semantic data and metadata extracted as described above, in order to identify similar content or multimedia objects in the different channels 102. In some of these embodiments, as described above, the correlation effort directly corresponds to and is defined by information request 104. In other embodiments, however, as described hereinafter, the correlation effort uses extracted semantics and metadata that was not generated in response to the information request.
It will be readily apparent that in order to correlate data content that has been disseminated or distributed by different channels, as described above, there must be a common basis, characteristic or feature that defines correlation. Herein, the term “dimension” is used to mean a particular basis for data content correlation. For example, a particular image may be widely used across multiple channels 102 in different contexts and with different texts. At the same time, a particular paragraph of text may be used with the particular image, but may also be used with a number of other images across the channels. For this situation, one dimension of correlation would be each data element provided by the multiple channels that contains the image, regardless of context or accompanying text. Another dimension would be each data element that contained the particular paragraph, likewise regardless of context or associated images.
After identifying correlated content that has been obtained from different multimedia channels, based on respective dimensions of correlation, collateral information relevant to the distribution of the correlated content is analyzed, wherein examples of such collateral information could include speech transcripts, closed captions, website text and related multimedia content, such as previous and subsequent videos in the same news source, and direct links from the same URL. Following analysis, the correlated information is grouped or placed into correlation sets, or syntactic containers 116, wherein each set or container is associated with a dimension of correlation. Each syntactic container 116 is a dynamic structure that only holds data content that is highly correlated with its associated dimension.
As stated above, the correlation effort and creation of correlation sets or syntactic containers 116 is closely associated with the extracted semantic data and metadata. The extracted semantics and metadata is used in the correlation procedure to identify similar and near-duplicate data across the multiple channels, and thus to construct syntactic containers 116. As indicated above, in one mode the extracted semantics and dimensions of correlation are defined by a particular request. Accordingly, syntactic containers 116 are constructed in response to, and thus after receiving, the request for information.
In another mode, a large number of syntactic containers 116 are constructed in accordance with the correlation procedure described above, wherein each container is associated with a pre-specified dimension of correlation.
It is anticipated that certain semantics and metadata associated with a multimedia object or content in one of the multiple channels 102 will be specific to that channel, and thus will not correlate with content in any of the other channels.
Referring to
Referring further to
Alternatively, each of the syntactic containers 116 could have been constructed on the fly, after receiving a particular request 104, in order to provide highly correlated information along dimensions pertinent to the request.
It is to be emphasized that creation of virtual semantic context container 210 provides two levels of content from multiple channels 102 in
As a further benefit, the configuration provided by semantic context container 210 can be used to carry out different types of searches, usefully referred to as horizontal and vertical searches. As indicated by result 212 shown in
A vertical search is integrated over multiple channel domains, to locate information that is relevant and thus personalized to the request.
Referring to
Referring to
At step 404, it is necessary to determine whether an extended indexing structure, of the type described above, is available for use. To be usable, an indexing structure would have to have syntactic containers corresponding to all of the dimensions of correlation that are defined by the information request. If this is not the case, the method proceeds to step 406, to extract semantic data and metadata from data content in each of the multiple channels, such as channels 102 in
Following extraction of respective data elements, extracted elements from different channels are correlated with one another, at step 408. For example, successive extracted data elements could be compared with a dimension of correlation, and would be accepted if they were found to be identical or similar to the dimension, to within a pre-specified degree. All such data elements from different channels would be highly correlated with one another, and would then all be placed in a syntactic container corresponding to the dimension. Step 410 shows creation of syntactic containers for the respective dimensions of correlation. Step 412 shows placement of the created syntactic containers, together with the data content thereof, into a virtual semantic context container as described above in connection with
Referring further to
At step 418, all the data content of the syntactic containers located in the virtual semantic context container is used collectively to respond to the information request. Software tools such as clustering, association rules, and various statistical and prediction packages are examples of tools that could be used to process the data in the virtual semantic context container, in order to provide a response to the information request.
Referring to
Data processing system 500 exemplifies a computer, in which code or instructions for implementing embodiments of the invention may be located. Data processing system 500 usefully employs a peripheral component interconnect (PCI) local bus architecture, although other bus architectures such as Accelerated Graphics Port (AGP) and Industry Standard Architecture (ISA) may alternatively be used.
The invention can take the form of an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The invention can further take the form of television devices, wireless communication devices, and other devices that can correlate or otherwise process multimedia data of any type.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
This invention was made with Government support under Contract No. 2004*H839800*000 awarded by Advanced Research Development Agency. The Government has certain rights in this invention.