1. Field of the Invention
The present invention relates generally to multimedia environments and, more particularly, to systems and methods for managing multimedia information.
2. Description of Related Art
Much of the information that exists today is not easily manageable. For example, databases exist for storing different types of multimedia information. Typically, these databases treat audio and video differently from text. Audio and video data are usually assigned text annotations to facilitate their later retrieval. Traditionally, the audio and video data are assigned the text annotations manually, which is a time-consuming task. The annotations also tended to be insufficient to unambiguously describe the media content. Automatic database creation systems were developed but did not solve the problems of the ambiguous annotations.
As a result, there is a need for systems and methods for managing multimedia information in a manner that is transparent to the actual type of media involved.
Systems and methods consistent with the present invention address this and other needs by providing multimedia information management in a manner that treats different types of data (e.g., audio, video, and text) the same for storage and retrieval purposes. A set of keys (document, section, and passage) are chosen that are common to all of the data types. Data may be assigned relative to the keys.
In one aspect consistent with the principles of the invention, a system facilitates the searching and retrieval of multimedia data items. The system receives data items from different types of media sources and identifies regions in the data items. The regions include document regions, section regions, and passage regions. Each of the section regions corresponds to one of the document regions and each of the passage regions corresponds to one of the section regions and one of the document regions. The system stores document identifiers that relate to the document regions in separate document records in a document table, section identifiers that relate to the section regions in separate section records in a section table, and passage identifiers that relate to the passage regions in separate passage records in a passage table.
In another aspect consistent with the principles of the invention, a method for storing multimedia data items in a database is provided. The method includes receiving data items from different types of media sources and identifying regions of the data items. The regions include document regions, section regions, and passage regions. Each of the section regions corresponds to one of the document regions and each of the passage regions corresponds to one of the section regions and one of the document regions. The method further includes generating document keys for the document regions, section keys for the section regions, and passage keys for the passage regions. The method also includes storing the document keys in a document table in the database, storing the section keys and corresponding ones of the document keys in a section table in the database, and storing the passage keys and corresponding ones of the document keys and the section keys in a passage table in the database.
In a further aspect consistent with the principles of the invention, a database is provided. The database stores data items relating to different types of media. The data items include regions, such as document regions, section regions, and passage regions. Each of the section regions corresponds to one of the document regions and each of the passage regions corresponds to one of the section regions and one of the document regions. The database includes a document table, a section table, and a passage table. The document table stores document keys that identify the document regions as document records. The section table stores section keys that identify the section regions as section records. The section records also store corresponding ones of the document keys. The passage table stores passage keys that identify the passage regions as passage records. The passage records also store corresponding ones of the section keys and the document keys.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,
The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents.
Systems and methods consistent with the present invention provide multimedia information management in a manner that treats different types of media the same for storage and retrieval purposes.
Multimedia sources 110 may include audio sources 112, video sources 114, and text sources 116. Audio sources 112 may include any source of audio data, such as radio, telephone, and conversations. Video sources 114 may include any source of video data, such as television, satellite, and a camcorder. Text sources 116 may include any source of text, such as e-mail, web pages, newspapers, and word processing documents.
Data analyzers 122-126 may include any mechanism that captures the data from multimedia sources 110, performs data processing and feature extraction, and outputs analyzed, marked up, and enhanced language metadata. In one implementation consistent with the principles of the invention, data analyzers 122-126 include a system, such as the one described in John Makhoul et al., “Speech and Language Technologies for Audio Indexing and Retrieval,” Proceedings of the IEEE, Vol. 88, No. 8, August 2000, pp. 1338-1353, which is incorporated herein by reference.
Data analyzer 122 may receive an input audio stream or file from audio sources 112 and generate metadata therefrom. For example, data analyzer 122 may segment the input stream/file by speaker, cluster audio segments from the same speaker, identify speakers known to data analyzer 122, and transcribe the spoken words. Data analyzer 122 may also segment the input stream/file based on topic and locate the names of people, places, and organizations (i.e., named entities). Data analyzer 122 may further analyze the input stream/file to identify the time at which each word is spoken (e.g., identify a time code). Data analyzer 122 may include any or all of this information in the metadata relating to the input audio stream/file.
Data analyzer 124 may receive an input video stream or file from video sources 122 and generate metadata therefrom. For example, data analyzer 124 may segment the input stream/file by speaker, cluster video segments from the same speaker, identify speakers known to data analyzer 124, and transcribe the spoken words. Data analyzer 124 may also segment the input stream/file based on topic and locate the names of people, places, and organizations. Data analyzer 124 may further analyze the input stream/file to identify the time at which each word is spoken (e.g., identify a time code). Data analyzer 124 may include any or all of this information in the metadata relating to the input video stream/file.
Data analyzer 126 may receive an input text stream or file from text sources 116 and generate metadata therefrom. For example, data analyzer 126 may segment the input stream/file based on topic and locate the names of people, places, and organizations. Data analyzer 126 may further analyze the input stream/file to identify where each word occurs (possibly based on a character offset within the text). Data analyzer 126 may also identify the author and/or publisher of the text. Data analyzer 126 may include any or all of this information in the metadata relating to the input text stream/file.
Loader 130 may include logic that receives the metadata from data analyzers 122-126 and stores it in database 140 based on features of the metadata. Database 140 may include a relational database that stores data in a manner transparent to the type of media involved. Database 140 may store the metadata from loader 130 in multiple tables based on features of the metadata. Database 140 will be described in more detail below.
Server 150 may include a computer or another device that is capable of managing database 140 and servicing client requests for information. Server 150 may provide requested information to a client 160, possibly in the form of a HyperText Markup Language (HTML) document or a web page. Client 160 may include a personal computer, a laptop, a personal digital assistant, or another type of device that is capable of interacting with server 150 to obtain information of interest. Client 160 may present the information to a user via a graphical user interface, possibly within a web browser window.
A document refers to a body of media that is contiguous in time (from beginning to end or from time A to time B) which has been processed and from which features have been extracted by data analyzers 122-126. Examples of documents might include a radio broadcast, such as NPR Morning Edition on Feb. 7, 2002, at 6:00 a.m. eastern, a television broadcast, such as NBC News on Mar. 19, 2002, at 6:00 p.m. eastern, and a newspaper, such as the Washington Post for Jan. 15, 2002.
A section refers to a contiguous region of a document that pertains to a particular theme or topic. Examples of sections might include local news, sports scores, and weather reports. Sections do not span documents, but are wholly contained within them. A document may have areas that do not have an assigned section. It is also possible for a document to have no sections.
A passage refers to a contiguous region within a section that has a certain linguistic or structural property. For example, a passage may refer to a paragraph within a text document or a speaker boundary within an audio or video document. Passages do not span sections, but are wholly contained within them. A section may have areas that do not have an assigned passage. It is also possible for a section to have no passages.
Documents, sections, and passages may be considered to form a hierarchy.
Returning to
Document key 215 may include a field that uniquely identifies a document. Examples of document keys 215 might include “Joe's word processing document about the proposal,” “NPR Morning Edition on Feb. 7, 2002, at 6:00 a.m. eastern,” or “The Star Trek episode about the tribbles.” Section key 225 may include a field that uniquely identifies a section within a particular document. A section key 225 may be unnamed, such as “Section 1,” or may include a theme or topic identifier, such as “Story about Enron Scandal” or “Budget.” Passage key 235 may include a field that uniquely identifies a passage within a particular section. A passage key 235 may be unnamed, such as “Passage 1,” or may have an identifier that relates it to the particular feature of the document that it matches.
One or more of keys 215, 225, and 235 maybe associated with each of tables 210, 220, and 230. For example, document table 210 may include document key 215 as the primary key. Section table 220 may include document key 215 and section key 225 as the primary key. Because document key 215 is the primary key of document table 210, document key 215 is also a foreign key for section table 220. Passage table 230 may include document key 215, section key 225, and passage key 235 as the primary key. Because document key 215 and section key 225 are primary keys of document table 210 and section table 220, respectively, document key 215 and section key 225 are also foreign keys for passage table 230.
By combining keys 215, 225, and 235, any passage or section of a document may be uniquely identified based on the location of the passage or section within the document. For example, using document key 215, section key 225, and passage key 235 to uniquely identify a passage, it is easy to determine the section (using section key 225) and document (using document key 215) in which the passage is located. This relationship flows in both directions.
Section table 220 may include document key 215, section key 225, and miscellaneous other fields 420. Miscellaneous other fields 420 may include fields relating to the start time of the section, the duration of the section, and/or the language in which the section was created. Passage table 230 may include document key 215, section key 225, passage key 235, and miscellaneous other fields 430. Miscellaneous other fields 430 may include fields relating to the start time of the passage, the duration of the passage, the name of a speaker in the passage, the gender of a speaker in the passage, and/or the language in which the passage was created.
In other implementations consistent with the principles of the invention, database 140 may include additional tables to facilitate the searching and retrieval of data.
Topic labels table 520 may include document key 215, section key 225, topic key 620, and miscellaneous other fields 630. Topic labels table 520 may include document key 215, section key 225, and topic key 620 as its primary key. Document key 215 and section key 225 are also foreign keys because they are primary keys for other tables. Miscellaneous other fields 630 may include topics, ranks, and scores relating to the section identified by section key 225 and/or the document identified by document key 215.
Named entity table 530 may include document key 215, section key 225, passage key 235, named entity (NE) key 640, and miscellaneous other fields 650. Named entity table 530 may include document key 215, section key 225, passage key 235, and named entity key 640 as its primary key. Document key 215, section key 225, and passage key 235 are also foreign keys because they are primary keys for other tables. Miscellaneous other fields 650 may include the type of named entity. A named entity may refer to a person, place, or organization within the passage identified by passage key 235, the section identified by section key 225, and/or the document identified by document key 215.
Facts table 540 may include document key 215, section key 225, passage key 235, and named entity key 640, and miscellaneous other fields 660. Facts table 540 may include document key 215, section key 225, passage key 235, and named entity key 640 as its primary key. Document key 215, section key 225, passage key 235, and named entity key 640 are also foreign keys because they are primary keys for other tables. Miscellaneous other fields 660 may include factual information, regarding the named entity identified by named entity key 640, that answers questions, such as who did what where, who said what, and where is that.
Processing may begin with a document being identified (act 710). This document identification might include obtaining an audio, video, or text document from multimedia sources 110. As described above, a document may include a span of media from beginning to end or from time A to time B. Assume that the document relates to an audio input stream from NPR Morning Edition on Feb. 7, 2002, from 6:00 a.m. to 6:30 a.m. eastern.
Returning to
Passages within a section may be differentiated by their linguistic or structural properties. In the example of
Document key 215, section key(s) 225, and passage key(s) 235 may be generated for the document and each of the identified sections and passages within the document (act 730). Document key 215 may uniquely identify the document. In the example of
A record in document table 210 of database 140 may be created for the new document. Document key 215 and, possibly, other document-related information may then be stored in the record within document table 210 (act 740). The other document-related information may include data relating to the time the document was created, the source of the document, a title of the document, the time the document started, the duration of the document, the region, subregion, and country in which the document originated, and/or the language in which the document was created.
Record(s) in section table 220 of database 140 may be created for the identified section(s) of the document. Document key 215, section key 225, and, possibly, other section-related information may then be stored in each of the records within section table 220 (act 750). The other section-related information may include data relating to the start time of the section, the duration of the section, and/or the language in which the section was created.
Record(s) in passage table 230 of database 140 may be created for the identified passage(s) of the document. Document key 215, section key 225, passage key 235, and, possibly, other passage-related information may then be stored in each of the records within passage table 230 (act 760). The other passage-related information may include data relating to the start time of the passage, the duration of the passage, the name of a speaker in the passage, the gender of a speaker in the passage, and/or the language in which the passage was created.
Passage table 230 includes seven records: records 920-950. Record 920 includes “NPR” document key 215 and “News” section key 225, and stores “Speaker A (1)” as its passage key 235. Record 925 includes “NPR” document key 215 and “News” section key 225, and stores “Speaker B (1)” as its passage key 235. Record 930 includes “NPR” document key 215 and “News” section key 225, and stores “Speaker A (2)” as its passage key 235. Record 935 includes “NPR” document key 215 and “News” section key 225, and stores “Speaker B (2)” as its passage key 235. Record 940 includes “NPR” document key 215 and “Sports” section key 225, and stores “Speaker C” as its passage key 235. Record 945 includes “NPR” document key 215 and “Sports” section key 225, and stores “Speaker D” as its passage key 235. Record 950 includes “NPR” document key 215 and “Sports” section key 225, and stores “Speaker E” as its passage key 235.
Returning to
Systems and methods consistent with the present invention provide multimedia information management in a manner that treats different types of data items (e.g., audio, video, and text) the same for storage and retrieval purposes. A set of keys (document, section, and passage) are chosen that are common to all of the data item types. All of the data items are assigned relative to the keys.
For any document, section of a document, or passage of a section, data features that are bounded within the same region may be easily extracted. For a section, for example, data features, such as named entities, time/offset codes (i.e., the time or place at which a word occurs), and extracted facts, can be easily retrieved. Similarly, for a name in a paragraph of text, the section and/or document containing that text may be easily located and retrieved.
The foregoing description of preferred embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention.
For example, three main tables (document, section, and passage) have been described. In other implementations consistent with the principles of the invention, a fourth table may be included that defines the time or offset at which words occur in a document. In the case of audio or video data, the time/offset may identify the time at which a word was spoken. In the case of text, the time/offset may identify the character offset of a word. The time/offset table may include a combination of document key 215, section key 225, passage key 235, and a time/offset key as its primary key.
No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. The scope of the invention is defined by the claims and their equivalents.
This application claims priority under 35 U.S.C. §119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082, filed Jul. 3, 2002, and Provisional Application No. 60/419,214, filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.
The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract No. N66001-00-C-8008 awarded by the Defense Advanced Research Projects Agency.
Number | Name | Date | Kind |
---|---|---|---|
4879648 | Cochran et al. | Nov 1989 | A |
4908866 | Goldwasser et al. | Mar 1990 | A |
5317732 | Gerlach, Jr. et al. | May 1994 | A |
5404295 | Katz et al. | Apr 1995 | A |
5418716 | Suematsu | May 1995 | A |
5544257 | Bellegarda et al. | Aug 1996 | A |
5559875 | Bieselin et al. | Sep 1996 | A |
5572728 | Tada et al. | Nov 1996 | A |
5684924 | Stanley et al. | Nov 1997 | A |
5715367 | Gillick et al. | Feb 1998 | A |
5752021 | Nakatsuyama et al. | May 1998 | A |
5757960 | Murdock et al. | May 1998 | A |
5768607 | Drews et al. | Jun 1998 | A |
5777614 | Ando et al. | Jul 1998 | A |
5787198 | Agazzi et al. | Jul 1998 | A |
5835667 | Wactlar et al. | Nov 1998 | A |
5862259 | Bokser et al. | Jan 1999 | A |
5875108 | Hoffberg et al. | Feb 1999 | A |
5960447 | Holt et al. | Sep 1999 | A |
5963940 | Liddy et al. | Oct 1999 | A |
5970473 | Gerszberg et al. | Oct 1999 | A |
6006221 | Liddy et al. | Dec 1999 | A |
6024571 | Renegar | Feb 2000 | A |
6029124 | Gillick et al. | Feb 2000 | A |
6029195 | Herz | Feb 2000 | A |
6052657 | Yamron et al. | Apr 2000 | A |
6064963 | Gainsboro | May 2000 | A |
6067514 | Chen | May 2000 | A |
6067517 | Bahl et al. | May 2000 | A |
6088669 | Maes | Jul 2000 | A |
6112172 | True et al. | Aug 2000 | A |
6151598 | Shaw et al. | Nov 2000 | A |
6161087 | Wightman et al. | Dec 2000 | A |
6169789 | Rao et al. | Jan 2001 | B1 |
6185531 | Schwartz et al. | Feb 2001 | B1 |
6219640 | Basu et al. | Apr 2001 | B1 |
6317716 | Braida et al. | Nov 2001 | B1 |
6332139 | Kaneko et al. | Dec 2001 | B1 |
6332147 | Moran et al. | Dec 2001 | B1 |
6360237 | Schulz et al. | Mar 2002 | B1 |
6373985 | Hu et al. | Apr 2002 | B1 |
6381640 | Powers et al. | Apr 2002 | B1 |
6434520 | Kanevsky et al. | Aug 2002 | B1 |
6437818 | Lauwers et al. | Aug 2002 | B1 |
6480826 | Pertrushin | Nov 2002 | B2 |
6602300 | Ushioda et al. | Aug 2003 | B2 |
6604110 | Savage et al. | Aug 2003 | B1 |
6647383 | August et al. | Nov 2003 | B1 |
6654735 | Eichstaedt et al. | Nov 2003 | B1 |
6708148 | Gschwendtner et al. | Mar 2004 | B2 |
6714911 | Waryas et al. | Mar 2004 | B2 |
6718303 | Tang et al. | Apr 2004 | B2 |
6778958 | Nishimura et al. | Aug 2004 | B1 |
6792409 | Wutte | Sep 2004 | B2 |
6847961 | Silverbrook et al. | Jan 2005 | B2 |
6922691 | Flank | Jul 2005 | B2 |
6931376 | Lipe et al. | Aug 2005 | B2 |
6961954 | Maybury et al. | Nov 2005 | B1 |
6973428 | Boguraev et al. | Dec 2005 | B2 |
6999918 | Ma et al. | Feb 2006 | B2 |
7131117 | Mills et al. | Oct 2006 | B2 |
20010026377 | Ikegami | Oct 2001 | A1 |
20010051984 | Fukusawa | Dec 2001 | A1 |
20020010575 | Haase et al. | Jan 2002 | A1 |
20020010916 | Thong et al. | Jan 2002 | A1 |
20020059204 | Harris | May 2002 | A1 |
20020184373 | Maes | Dec 2002 | A1 |
20030051214 | Graham et al. | Mar 2003 | A1 |
20030093580 | McGee et al. | May 2003 | A1 |
20030167163 | Glover et al. | Sep 2003 | A1 |
20040024739 | Copperman et al. | Feb 2004 | A1 |
20040073444 | Peh et al. | Apr 2004 | A1 |
20050060162 | Mohit et al. | Mar 2005 | A1 |
20060129541 | Morgan et al. | Jun 2006 | A1 |
Number | Date | Country |
---|---|---|
0664636 | Jul 1995 | EP |
0935378 | Aug 1999 | EP |
0715298 | Jun 2000 | EP |
1079313 | Feb 2001 | EP |
1103952 | May 2001 | EP |
1176493 | Jan 2002 | EP |
1 422 692 | May 2004 | EP |
361285570 | Dec 1986 | JP |
WO-9917235 | Apr 1999 | WO |
WO-0059223 | Oct 2000 | WO |
WO-0229612 | Apr 2002 | WO |
WO-0229614 | Apr 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040006576 A1 | Jan 2004 | US |
Number | Date | Country | |
---|---|---|---|
60419214 | Oct 2002 | US | |
60394082 | Jul 2002 | US | |
60394064 | Jul 2002 | US |