Generating topic-specific language models

Description

BACKGROUND

Automated speech recognition uses a language model to identify the most likely candidate matching a word or expression used in a natural language context. In many instances, the language model used is built using a generic corpus of text and might not offer the most accurate or optimal representation of natural language for a given topic. For example, in a scientific context, the word “star” may be less likely to follow the phrase “country music” than in an entertainment context. Accordingly, when evaluating an audio signal relating to science, a speech recognition system may achieve more accurate results using a language model specific to the topic of science, rather than a generic language model.

BRIEF SUMMARY

The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects. It is not intended to identify key or critical elements or to delineate the scope. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the more detailed description provided below.

According to one or more aspects, a speech recognition system may automatically generate a topic specific language model and recognize words in a speech signal using the generated model. For example, a speech recognition system may initially determine words in a audio speech signal using a basic or generic language model. A language model, as used herein, generally refers to a construct that defines probabilities of words appearing after another word or set of words (or within a predefined proximity of another word). The speech recognition system may use the determined words to identify one or more topics associated with the speech signal and use the identified topics to obtain a corpus of text relating to those topics. The corpus of text allows the speech recognition system to create a topic specific language model by, in one example, modifying or adapting the basic or generic language model according to the probabilities and language structure presented in the topic specific corpus of text. A second speech recognition pass may then be performed using the topic specific language model to enhance the accuracy of speech recognition. In one or more arrangements, the topic specific language model may be generated on-the-fly, thereby eliminating the need to pre-generate language models prior to receiving or beginning processing of an audio signal.

According to another aspect, collecting a corpus of topic specific text may include generating one or more search queries and using those search queries to identify articles, publications, websites and other documents and files. In one example, the search queries may be entered into a search engine such as GOOGLE or PUBMED. Text may then be extracted from each of the results returned from the search. In one or more arrangements, a corpus collection module may further clean the text by removing extraneous or irrelevant data such as bylines, advertisements, images, formatting codes and information and the like. The corpus collection module may continue to collect text until a specified threshold has been reached.

According to another aspect, multiple queries may be generated for corpus collection. For example, a speech recognition system or text collection module may generate multiple queries for a single topic to increase the amount of text returned. Alternatively or additionally, an audio signal may include multiple topics. Accordingly, at least one query may be generated for each of the multiple topics to insure that the corpus of text collected is representative of the audio signal.

According to yet another aspect, the corpus of text collected may be representative of a distribution of topics associated with the speech signal. Stated differently, a speech signal may include a variety of topics, each topic having a degree of emphasis or significance in that speech signal. The corpus of text may include amounts of text that have been collected based on that distribution of topic significance or emphasis. In one example, the number of words or phrases associated with a topic may be used as a measure of its significance in a speech signal. A threshold number of words may then be divided according to the significance.

The details of these and other embodiments are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

FIG. 1 illustrates an example network distribution system in which content items may be provided to subscribing clients.

FIG. 2 illustrates an example speech recognition system configured to identify words in an audio signal based on a topic specific language model according to one or more aspects described herein.

FIG. 3 illustrates an example segment of an audio signal that may be processed using a speech recognition system according to one or more aspects described herein.

FIG. 4 illustrates an example listing of meaningful words according to one or more aspects described herein.

FIG. 5 illustrates an example keyword table according to one or more aspects described herein.

FIG. 6 is a flowchart illustrating an example method for creating a topic specific language model and using the topic specific language model to perform speech recognition on an audio signal according to one or more aspects described herein.

FIG. 7 is a flowchart illustrating an example method for collecting a corpus of text for creating a topic specific language model according to one or more aspects described herein.

DETAILED DESCRIPTION

FIG. 1 illustrates a content processing and distribution system 100 that may be used in connection with one or more aspects described herein. The distribution system 100 may include a headend 102, a network 104, set top boxes (STB) 106 and corresponding receiving devices (e.g., receiver, transceiver, etc.) 108. The distribution system 100 may be used as a media service provider/subscriber system wherein the provider (or vendor) generally operates the headend 102 and the network 104 and also provides a subscriber (e.g., client, customer, service purchaser, user, etc.) with the STB 106.

The STB 106 is generally located at the subscriber location such as a subscriber's home, a tavern, a hotel room, a business, etc., and the receiving device 108 is generally provided by the subscribing client. The receiving device 108 may include a television, high definition television (HDTV), monitor, host viewing device, MP3 player, audio receiver, radio, communication device, personal computer, media player, digital video recorder, game playing device, etc. The device 108 may be implemented as a transceiver having interactive capability in connection with the STB 106, the headend 102 or both the STB 106 and the headend 102. Alternatively, STB 106 may include a cable modem for computers for access over cable.

The headend 102 is generally electrically coupled to the network 104, the network 104 is generally electrically coupled to the STB 106, and each STB 106 is generally electrically coupled to the respective device 108. The electrical coupling may be implemented as any appropriate hard-wired (e.g., twisted pair, untwisted conductors, coaxial cable, fiber optic cable, hybrid fiber cable, etc.) or wireless (e.g., radio frequency, microwave, infrared, etc.) coupling and protocol (e.g., Home Plug, HomePNA, IEEE 802.11(a-b), Bluetooth, HomeRF, etc.) to meet the design criteria of a particular application. While the distribution system 100 is illustrated showing one STB 106 coupled to one respective receiving device 108, each STB 106 may be configured with having the capability of coupling more than one device 108.

The headend 102 may include a plurality of devices 110 (e.g., devices 110a-110n) such as data servers, computers, processors, security encryption and decryption apparatuses or systems, and the like configured to provide video and audio data (e.g., movies, music, television programming, games, and the like), processing equipment (e.g., provider operated subscriber account processing servers), television service transceivers (e.g., transceivers for standard broadcast television and radio, digital television, HDTV, audio, MP3, text messaging, gaming, etc.), and the like. At least one of the devices 110 (e.g., a sender security device 110x), may include a security system.

In one or more embodiments, network 104 may further provide access to a wide area network (WAN) 112 such as the Internet. Accordingly, STB 106 or headend 102 may have access to content and data on the wide area network. Content items may include audio, video, text and/or combinations thereof. In one example, a service provider may allow a subscriber to access websites 114 and content providers 116 connected to the Internet (e.g., WAN 112) using the STB 106. Websites 114 may include news sites, social networking sites, personal webpages and the like. In another example, a service provider (e.g., a media provider) may supplement or customize media data sent to a subscriber's STB 106 using data from the WAN 112.

Alternatively or additionally, one or more other computing devices 118 may be used to access either media distribution network 104 or wide area network 112.

Using networks such as those illustrated and described with respect to FIG. 1, a speech recognition device and/or system may access a corpus of information that relates to a specific topic or set of topics to refine and build a topic specific language model. The topic specific language model may be better honed to identify the spoken words used in natural language associated with the identified topic or topics. In one or more examples, a speech recognition device may search for articles and other textual material relating to a topic from various content sources such as content providers 116 and websites 114 of FIG. 1. The speech recognition device may then generate a language model based thereon, as described in further detail herein.

FIG. 2 illustrates an example speech recognition device configured to generate a language model based on a particular topic. Initially, natural language data such as audio is received by speech recognizer module 205 of speech recognition device 200 to identify an initial set of words contained in the audio based on a generic language model stored in database 210. A generic language model may be created using a generic corpus of text and might not be specific to any particular topic. Speech recognizer module 205 may include software, hardware, firmware and/or combinations thereof such as SCANSOFT's DRAGON NATURALLY SPEAKING speech recognition software.

From the initial set of identified words, topic extractor 215 is configured to identify one or more topics associated with the natural language data. Topics may be identified from the initial set of words in a variety of ways including by determining a frequency of words used, identification of meaningful vs. non-meaningful words, determining a type of word (e.g., noun, verb, etc.) and/or combinations thereof. For example, words that are used most frequently might be treated as being indicative of a topic of the audio. In another example, meaningful words might be predefined and identified in the natural language data. Accordingly, topic extractor 215 may eliminate non-meaningful words such as “the” or “of” from topic consideration even if such words appear relatively frequently. In one example, stop word lists or noise word lists may be used to filter out non-meaningful words. Stop word lists and other types of word filtering lists may be topic-specific or may be universal for all topics.

In some arrangements, speech recognizer module 205 might not perform a first pass on the natural language to identify the initial set of words. Instead, topic extractor 215 may be configured to identify topics associated with the natural language based on other information such as metadata. For example, if speech recognition device 200 is processing audio stored in an audio file, topic extractor 215 may extract topics from metadata included in the audio file such as a genre, artist, subject and title. If the audio file is located on a webpage, topic extractor 215 may use page or site data extracted from the webpage for topic determination. Alternatively or additionally, a combination of metadata and the initial set of recognized words may be used to identify topics to which the audio relates. A topic may include any number of words and in some instances, may include phrases.

Once topic extractor 215 has outputted the topic(s) of the natural language data, a query generator 225 of a corpus collector module 220 is configured to create search queries for obtaining a corpus of text relating to the identified topics. In one example, the query generator 225 may create search queries for a search engine 235 such as GOOGLE. In another example, query generator 225 may formulate queries for identifying publications in a database such as PUBMED. Queries may be formed using the identified topic words or phrases in a keyword search. Alternatively or additionally, speech recognition device 200 may maintain a definition or meaning table in database 210 to provide further keywords that may be used in a search query. For example, the word “rocket” may be associated with additional key words and phrases “weapon,” “propulsion,” “space shuttle” and the like. Accordingly, multiple search query strings may be formed using various combinations of the topic words and associated keywords.

Articles and other text identified through the search query may then be fed from corpus collector module 220 into a language model generator 230 that creates a language model specific to the topic or topics identified by topic extractor 215. Language models, as used herein, generally refer to data constructs configured to represent a probability of a sequence of words appearing together. Various types of language models may include n-gram language models which specify the probability of a set of n words appearing together (sometimes in a certain sequence). In one example, a language model may indicate that the probability of the word “friend” appearing immediately after the word “best” is more likely than “friend” appearing immediately after the word “chest” in a n-gram language model, where n=2. Accordingly, a speech recognition device such as device 200 may be able to ascertain whether an utterance (e.g., a spoken word or sound in an audio signal) corresponds to the word “chest” or “best” based on the following word (e.g., “friend”). Thus, a language model allows a device or a user to determine the odds that a speech signal includes word or phase x.

To create the topic specific language model, language model generator 230 may modify a basic language model in accordance with the probabilities determined from the text collected by corpus collector 220 (as discussed in further detail herein). Thus, probabilities of certain word combinations or n-grams may be modified based on their frequency of occurrence in the collected corpus of text. Using this topic specific language model, speech recognition device 200 may perform a second pass on the natural language to identify the words used in the speech.

FIG. 3 illustrates an example segment of an audio speech signal from which one or more topics may be extracted. Segment 300 may represent a speech signal from a television show or some other audio clip, for instance. From segment 300, topics such as movies, Actor X and sci-fi may be extracted based on frequency of words associated with those topics, definition of meaningful vs. non-meaningful words and the like. FIG. 4, for example, illustrates a list 400 of predefined meaningful words that may be evaluated in determining a topic of speech. Accordingly, because “movie” appears in segment 300, a speech recognition device (e.g., device 200 of FIG. 2) may evaluate whether movies is a topic of segment 300. Words or phrases not in list 400 might be discarded from topic consideration.

Frequency, on the other hand, corresponds to the number of times a word or topic appears in a segment of speech. In some instances, a topic may correspond to multiple words. Accordingly, even though segment 300 includes only 1 mention of the word “movie,” a frequency assigned to the topic of movies may have a value of 2 in view of the use of the phrase “big screen,” a known colloquialism for movies. In one or more configurations, a word or phrase may be extracted as a topic if the determined frequency is above a certain threshold. The threshold may be defined manually, automatically or a combination thereof. In one example, topics may be identified from the three words or phrases used most frequently in segment 300. Thus, the threshold may be defined as the frequency of the least frequent word or phrase of the top three most frequently used words or phrases. According to one or more arrangements, frequency might only be evaluated upon determining that a word or phrase falls into the category of a meaningful word or phrase.

FIG. 5 illustrates an example of a keyword table storing lists of keywords in association with various topic words. Topic words 501 may be listed in one section of table 500, while associated keywords 503 may be provided in another section. Example topic words 501 may include “food,” “sports,” “football,” and “photography.” Topic word food 501a may be associated with keywords or key phrases 503a such as “meal,” “lunch,” “dinner,” and “hot dogs.” Topic word sports 501b, on the other hand, may be associated with keywords or phrases 503b that include “athletic activity,” “competition,” “football,” “hockey” and the like. Using the keywords and key phrases specified in table 500, search queries may be formed for retrieving text corresponding to a particular topic. In one example, if a speech recognition device wants to retrieve articles associated with sports, the device may generate a search string such as “athletic activity competition articles.” Note that in this example, the word articles may be tacked onto the end of the query to limit the types of results returned (e.g., articles rather than photo galleries).

FIG. 6 illustrates an example method for building a topic specific language model and performing speech recognition using the topic specific language model. In step 600, a speech recognition system may receive a speech signal from an audio source. The audio source may include an audio data file, an audio/video file that includes an audio track, a line-in input (e.g., a microphone input device) and the like. In step 605, the speech recognition system subsequently performs a first speech recognition pass over the received audio/speech signal using a generic or basic language model. In some instances, the first speech recognition pass might only return words that have been recognized with a specified level of confidence (e.g., 95%, 99% or the like). The speech recognition system may then determine topics from the returned words recognized from the first pass over the audio signal in step 610.

Using the determined topics, the speech recognition may subsequently generate one or more search queries to identify a corpus of text relevant to the determined topics in step 615. For example, search queries may be created by assembling known keywords associated with or describing the specified topic, as described herein. In response to the search query, the speech recognition system may receive a plurality of search results in step 620. These search results may include multiple types of information including articles, blogs, text from images, metadata, and text from a webpage and may be received from various databases and search engines. Text from each of the search results may then be extracted and collected in step 625. In step 630, the system may determine whether a sufficient number of words has been collected from the search results. The determination may be made by comparing the number of words collected with a specified threshold number of words. The threshold number of words may be, for example, 100,000, 200,000, 1,000,000 or 10,000,000. If the collector module has collected an insufficient number of words, the module may repeat steps 615-625 to obtain more words. For instance, the collector module may generate a new search query or, alternatively or additionally, extract words from additional search results not considered in the first pass.

If, on the other hand, the collector module has obtained a sufficient number of words from the search results, the system may generate a topic specific language model in step 635 using the corpus of text collected. The system may, for example, adapt or revise a basic or generic language model based on the corpus of topic specific text retrieved. By way of example, assuming that a generic or initial language model shows that the probability of the word “dust” immediately following the word “cosmic” at 30% and the probability of the word “dust” immediately following the word “house” at 70%. Assuming that at least one of the topics in the corpus collection and, correspondingly, the speech to be recognized is space, the corpus of topic specific text may show that the probability that the word “dust” appears immediately after the word “cosmic” is 80% versus 20% for “dust” immediately appearing after “house.” Accordingly, the speech recognition system may modify the language model to reflect the probabilities determined based on the corpus of topic specific text. Alternatively, the speech recognition system may average the percentages. For example, the average of the two probabilities of “dust” following “cosmic” may result in a 55% probability while the average for “dust” following “house” may average out to 45%. Other algorithms and methods for adjusting a basic language model to produce the topic specific language model may be used. The above example is merely used to illustrate some aspects of the disclosure and is simplified. Language models generally include a greater number of possible word combinations (e.g., many other words may immediately precede the word “dust”) and probabilities than discussed in the example above.

Once the topic specific language model has been created, the speech recognition system may perform a second pass over the speech to make a final identification of the words spoken in step 640. The words identified in the second pass may be used for a variety of purposes including automatic transcription of recorded audio, creating a document by speaking the words rather than by typing, data entry and the like.

FIG. 7 illustrates an example method for collecting a corpus of topic specific text. In step 700, a topic specific query may be created. In step 705, the query may be executed in a search engine to identify one or more groups of text such as articles, websites, press releases and the like. In step 710, the corpus collection module may extract and enqueue a source identifier or location (e.g., a URI or URL) of the text files or documents matching the search query. In step 715, the corpus collection module may extract text from each document or file identified in the search in accordance with the queue and convert the text into raw text. Raw text may include the characters forming the words and phrases with formatting and other extraneous information such as metadata removed. In step 720, the raw text may be cleaned. In particular, words or text that does not form a part of the content of the document or article may be removed. For example, HTML files usually include several text tags or markup elements such as <BODY> </BODY> and the like. Because those headers are not part of the content of the web page or HTML site, the headers may be removed so as not to pollute the corpus of text being used to build a topic specific language model. The corpus collection module may use a dictionary of extraneous text to clean the raw text.

In step 725, the corpus collection module may determine whether a threshold number of words has been collected. If so, the corpus collection module may return the current set of words as a final corpus in step 730. If, however, the corpus collection module determines that the threshold number of words has not been collected, the corpus collection module may determine whether additional pages (e.g., a webpage) or groups of search results are available in step 735. If so, the corpus collection module may repeat steps 710-720 to process one or more additional pages or groups of search results. If, however, no additional search results are available, the corpus collection module may return to step 700 to obtain text using another search query in step 740.

The method of FIG. 7 may be repeated or used for each topic, topic word or topic phrase identified by a topic extractor (e.g., topic extractor 215 of FIG. 2). Each topic, topic word or topic phrase may have an associated threshold number of words that is to be collected. The threshold number for each topic, topic word or phrase may be determined by dividing a total number of words needed by the number of topics, topic words and topic phrases. Alternatively, the threshold for each query or topic may be determined based on an estimated significance of the topic so that the corpus of text is topically representative of the speech signal. Significance of a topic may be estimated, for example, by determining a number of words or phrases identified as being associated with the topic in the first speech recognition pass.

In one or more arrangements, a query may include phrases or words for multiple topics of the speech signal to insure that the results received are more likely to be relevant. For example, if a speech signal is related to the Battle of Bull Run, submitting queries using only a single word or phrase from the list of “bull,” “run,” “civil war,” “battle,” “Manassas,” and “Virginia” might produce search results that are entirely unrelated. For example, an article about anatomy of a bull may be returned. Alternatively or additionally, an article or movie review about Forest Gump might be returned using a query that was solely focused on the word “run.” Thus, a query such as “bull run” might be used instead to identify articles, documents and the like that are more likely to be relevant to the actual topic or topics of the speech signal.

The methods and systems described herein may be used in contexts and environments other than audio signals. For example, a topic specific language model may be used to aid in optical character recognition to improve the accuracy of the characters and words identified in a particular image or document.

The methods and features recited herein may further be implemented through any number of computer readable media that are able to store computer readable instructions. Examples of computer readable media that may be used include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, DVD or other optical disk storage, magnetic cassettes, magnetic tape, magnetic storage and the like.

Additionally or alternatively, in at least some embodiments, the methods and features recited herein may be implemented through one or more integrated circuits (IC s). An integrated circuit may, for example, be a microprocessor that accesses programming instructions or other data stored in a read only memory (ROM). In some such embodiments, the ROM stores programming instructions that cause the IC to perform operations according to one or more of the methods described herein. In at least some other embodiments, one or more of the methods described herein are hardwired into an IC. In other words, the IC is in such cases an application specific integrated circuit (ASIC) having gates and other logic dedicated to the calculations and other operations described herein. In still other embodiments, the IC may perform some operations based on execution of programming instructions read from ROM or RAM, with other operations hardwired into gates and other logic of IC. Further, the IC may output image data to a display buffer.

Although specific examples of carrying out the invention have been described, those skilled in the art will appreciate that there are numerous variations and permutations of the above-described systems and methods that are contained within the spirit and scope of the invention as set forth in the appended claims. Additionally, numerous other embodiments, modifications and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure.

Claims

1. A method comprising: determining, based on a first speech recognition process associated with a first language model, a topic associated with an audio signal;performing a plurality of searches of a corpus to identify a plurality of terms related to the topic, wherein the corpus comprises a collection of text other than a transcript of the audio signal;in response to determining that the quantity of the plurality of terms identified by the searches as related to the topic matches or exceeds a threshold quantity: generating, based on the plurality of terms identified in the corpus, a second language model; anddetermining, based on a second speech recognition process associated with the generated second language model, the transcript of the audio signal.
2. The method of claim 1, further comprising at least one of: generating, based on the transcript, a closed-captioning feed for the audio signal:causing words of the transcript to be input into a computer application; ordetermining, based on the transcript, topic segments of the audio signal.
3. The method of claim 1, wherein the quantity of the plurality of terms is based on at least one of: a total quantity of terms needed to generate the second language model and a quantity of topics associated with the audio signal; ora respective significance, based on the first speech recognition process, for each of a plurality of topics associated with the audio signal.
4. The method of claim 1, wherein the second language model comprises a modification of the first language model.
5. The method of claim 1, wherein the determining the topic comprises: determining that a frequency of one or more terms, in the audio signal and associated with the topic, satisfies a frequency threshold.
6. The method of claim 1, wherein the determining the plurality of terms comprises continuing to perform searches to identify terms until corresponding search results matches or exceeds a threshold quantity.
7. The method of claim 1, wherein the one or more searches comprise at least one of: a web search; or a publication database search.
8. The method of claim 1, further comprising: determining the second language model based on the first language model.
9. The method of claim 1, wherein, in response to the quantity of the plurality of terms related to topic not meeting a threshold quantity: conducting additional searches associated with the topic to determine additional plurality of terms related to the topic.
10. The method of claim 1, wherein, performing a plurality of searches of a corpus to identify a plurality of terms related to the topic further comprises, creating the plurality of searches by assembling known keywords associated with the topic.
11. An apparatus comprising: one or more processors; andmemory storing instructions that, when executed by the one or more processors, cause the apparatus to:determine, based on a first speech recognition process associated with a first language model, a topic associated with an audio signal;perform a plurality of searches of a corpus to identify a plurality of terms related to the topic, wherein the corpus comprises a collection of text other than a transcript of the audio signal;in response to determining that the quantity of the plurality of terms identified by the searches as related to the topic matches or exceeds a threshold quantity: generate, based on the plurality of terms identified in corpus, a second language model; anddetermine, based on a second speech recognition process associated with the generated second language model, the transcript of the audio signal.
12. The apparatus of claim 11, wherein the instructions, when executed by the one or more processors, further cause the apparatus to perform at least one of: generate, based on the transcript, a closed-captioning feed for the audio signal; cause words of the transcript to be input into a computer application; ordetermine, based on the transcript, topic segments of the audio signal.
13. The apparatus of claim 11, wherein the quantity of the plurality of terms is based on at least one of: a total quantity of terms needed to generate the second language model and a quantity of topics associated with the audio signal; ora respective significance, based on the first speech recognition process, for each of a plurality of topics associated with the audio signal.
14. The apparatus of claim 11, wherein the second language model comprises a modification of the first language model.
15. The apparatus of claim 11, wherein the instructions, when executed by the one or more processors, further cause the apparatus to determine the topic by determining that a frequency of one or more terms, in the audio signal and associated with the topic, satisfies a frequency threshold.
16. The apparatus of claim 11, wherein the instructions, when executed by the one or more processors, further cause the apparatus to determine the plurality of terms by continuing to perform searches until corresponding search results matches or exceeds a threshold quantity.
17. The apparatus of claim 11, wherein the one or more searches comprise at least one of: a web search; or a publication database search.
18. The apparatus of claim 11, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine the second language model based on the first language model.
19. A non-transitory computer-readable medium storing instructions that, when executed, cause: determining, based on a first speech recognition process associated with a first language model, a topic associated with an audio signal;performing a plurality of searches of a corpus to identify a plurality of terms related to the topic, wherein the corpus comprises a collection of text other than a transcript of the audio signal;in response to determining that the quantity of the plurality of terms identified by the searches as related to the topic matches or exceeds a threshold quantity: generating, based on the plurality of terms identified in the corpus, a second language model; anddetermining, based on a second speech recognition process associated with the generated second language model, the transcript of the audio signal.
20. The non-transitory computer-readable storage medium of claim 19, wherein the instructions, when executed, further cause at least one of: generating, based on the transcript, a closed-captioning feed for the audio signal;causing words of the transcript to be input into a computer application; ordetermining, based on the transcript, topic segments of the audio signal.
21. The non-transitory computer-readable storage medium of claim 19, wherein the quantity of the plurality of terms is based on at least one of: a total quantity of terms needed to generate the second language model and a quantity of topics associated with the audio signal; ora respective significance, based on the first speech recognition process, for each of a plurality of topics associated with the audio signal.
22. The non-transitory computer-readable storage medium of claim 19, wherein the second language model comprises a modification of the first language model.
23. The non-transitory computer-readable storage medium of claim 19, wherein the determining the topic comprises: determining that a frequency of one or more terms, in the audio signal and associated with the topic, satisfies a frequency threshold.
24. The non-transitory computer-readable storage medium of claim 19, wherein the determining the plurality of terms comprises continuing to perform searches until corresponding search results matches or exceeds a threshold quantity.
25. The non-transitory computer-readable storage medium of claim 19, wherein the one or more searches comprise at least one of: a web search; or a publication database search.
26. The non-transitory computer-readable storage medium of claim 19, wherein the instructions, when executed, further cause: determining the second language model based on the first language model.

CROSS-REFERENCES TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent application Ser. No. 15/843,846, filed on Dec. 15, 2017, which is a continuation of U.S. patent application Ser. No. 12/496,081, filed on Jul. 1, 2009, the contents of which are hereby incorporated by reference in their entirety.

US Referenced Citations (222)

Number	Name	Date	Kind
4227177	Moshier	Oct 1980	A
5493677	Balogh et al.	Feb 1996	A
5521841	Arman et al.	May 1996	A
5530859	Tobias, II et al.	Jun 1996	A
5535063	Lamming	Jul 1996	A
5553281	Brown et al.	Sep 1996	A
5576755	Davis et al.	Nov 1996	A
5594897	Goffman	Jan 1997	A
5640553	Schultz	Jun 1997	A
5649182	Reitz	Jul 1997	A
5666528	Thai	Sep 1997	A
5682326	Klingler et al.	Oct 1997	A
5717914	Husick et al.	Feb 1998	A
5729741	Liaguno et al.	Mar 1998	A
5737495	Adams et al.	Apr 1998	A
5737734	Schultz	Apr 1998	A
5742816	Barr et al.	Apr 1998	A
5761655	Hoffman	Jun 1998	A
5765150	Burrows	Jun 1998	A
5799315	Rainey et al.	Aug 1998	A
5819292	Hitz et al.	Oct 1998	A
5845279	Garofalakis et al.	Dec 1998	A
5857200	Togawa	Jan 1999	A
5924090	Krellenstein	Jul 1999	A
5928330	Goetz et al.	Jul 1999	A
5937422	Nelson et al.	Aug 1999	A
5956729	Goetz et al.	Sep 1999	A
5982369	Sciammarella et al.	Nov 1999	A
6038560	Wical	Mar 2000	A
6052657	Yamron et al.	Apr 2000	A
6055543	Christensen et al.	Apr 2000	A
6058392	Sampson et al.	May 2000	A
6167377	Gillick et al.	Dec 2000	A
6188976	Ramaswamy et al.	Feb 2001	B1
6278992	Curtis et al.	Aug 2001	B1
6320588	Palmer et al.	Nov 2001	B1
6343294	Hawley	Jan 2002	B1
6345253	Viswanathan	Feb 2002	B1
6363380	Dimitrova	Mar 2002	B1
6366296	Boreczky et al.	Apr 2002	B1
6374260	Hoffert et al.	Apr 2002	B1
6415434	Kind	Jul 2002	B1
6418431	Mahajan	Jul 2002	B1
6463444	Jain et al.	Oct 2002	B1
6545209	Flannery et al.	Apr 2003	B1
6546385	Mao et al.	Apr 2003	B1
6567980	Jain et al.	May 2003	B1
6580437	Liou et al.	Jun 2003	B1
6675174	Bolle et al.	Jan 2004	B1
6698020	Zigmond et al.	Feb 2004	B1
6771875	Kunieda et al.	Aug 2004	B1
6789088	Lee et al.	Sep 2004	B1
6792426	Baumeister et al.	Sep 2004	B2
6877134	Fuller et al.	Apr 2005	B1
6882793	Fu et al.	Apr 2005	B1
6901364	Nguyen et al.	May 2005	B2
6937766	Wilf et al.	Aug 2005	B1
6970639	McGrath et al.	Nov 2005	B1
7016830	Huang	Mar 2006	B2
7155392	Schmid et al.	Dec 2006	B2
7177861	Tovinkere et al.	Feb 2007	B2
7206303	Karas et al.	Apr 2007	B2
7272558	Soucy et al.	Sep 2007	B1
7376642	Nayak et al.	May 2008	B2
7472137	Edelstein et al.	Dec 2008	B2
7490092	Sibley et al.	Feb 2009	B2
7548934	Platt et al.	Jun 2009	B1
7584102	Hwang et al.	Sep 2009	B2
7596549	Issa et al.	Sep 2009	B1
7739286	Sethy et al.	Jun 2010	B2
7788266	Venkataraman et al.	Aug 2010	B2
7792812	Carr	Sep 2010	B1
7814267	Iyengar et al.	Oct 2010	B1
7921116	Finkelstein et al.	Apr 2011	B2
7925506	Farmaner et al.	Apr 2011	B2
7958119	Eggink et al.	Jun 2011	B2
7983902	Wu et al.	Jul 2011	B2
8041566	Peters et al.	Oct 2011	B2
8078467	Wu et al.	Dec 2011	B2
8117206	Sibley et al.	Feb 2012	B2
8265933	Bates et al.	Sep 2012	B2
8468083	Szulczewski	Jun 2013	B1
8527520	Morton et al.	Sep 2013	B2
8572087	Yagnik	Oct 2013	B1
8909655	McDonnell	Dec 2014	B1
10073829	Medlock	Sep 2018	B2
20010014891	Hoffert et al.	Aug 2001	A1
20020035573	Black et al.	Mar 2002	A1
20020087315	Lee et al.	Jul 2002	A1
20020091837	Baumeister et al.	Jul 2002	A1
20020143774	Vandersluis	Oct 2002	A1
20020194181	Wachtel	Dec 2002	A1
20030014758	Kim	Jan 2003	A1
20030033297	Ogawa	Feb 2003	A1
20030050778	Nguyen et al.	Mar 2003	A1
20030061028	Dey et al.	Mar 2003	A1
20030093790	Logan et al.	May 2003	A1
20030135582	Allen et al.	Jul 2003	A1
20030163443	Wang	Aug 2003	A1
20030163815	Begeja et al.	Aug 2003	A1
20030195877	Ford et al.	Oct 2003	A1
20030204513	Bumbulis	Oct 2003	A1
20040111465	Chuang et al.	Jun 2004	A1
20040117831	Ellis et al.	Jun 2004	A1
20040139091	Shin	Jul 2004	A1
20040215634	Wakefield et al.	Oct 2004	A1
20040225667	Hu et al.	Nov 2004	A1
20040243539	Skurtovich et al.	Dec 2004	A1
20040254795	Fujii et al.	Dec 2004	A1
20040267700	Dumais et al.	Dec 2004	A1
20050044105	Terrell	Feb 2005	A1
20050060647	Doan et al.	Mar 2005	A1
20050091443	Hershkovich et al.	Apr 2005	A1
20050097138	Kaiser et al.	May 2005	A1
20050114130	Java et al.	May 2005	A1
20050152362	Wu	Jul 2005	A1
20050182792	Israel et al.	Aug 2005	A1
20050193005	Gates et al.	Sep 2005	A1
20050222975	Nayak et al.	Oct 2005	A1
20060004738	Blackwell et al.	Jan 2006	A1
20060037046	Simms et al.	Feb 2006	A1
20060074671	Farmaner et al.	Apr 2006	A1
20060085406	Evans et al.	Apr 2006	A1
20060088276	Cho et al.	Apr 2006	A1
20060100898	Pearce et al.	May 2006	A1
20060112097	Callaghan et al.	May 2006	A1
20060156399	Parmar et al.	Jul 2006	A1
20060161546	Callaghan et al.	Jul 2006	A1
20060167859	Verbeck Sibley et al.	Jul 2006	A1
20060184495	Crosby et al.	Aug 2006	A1
20060212288	Sethy	Sep 2006	A1
20060235843	Musgrove et al.	Oct 2006	A1
20060253780	Munetsugu et al.	Nov 2006	A1
20060256739	Seier et al.	Nov 2006	A1
20070011133	Chang	Jan 2007	A1
20070050343	Siddaramappa et al.	Mar 2007	A1
20070050366	Bugir et al.	Mar 2007	A1
20070067285	Blume et al.	Mar 2007	A1
20070078708	Yu et al.	Apr 2007	A1
20070083374	Bates et al.	Apr 2007	A1
20070156677	Szabo	Jul 2007	A1
20070208567	Amento et al.	Sep 2007	A1
20070211762	Song et al.	Sep 2007	A1
20070214123	Messer et al.	Sep 2007	A1
20070214488	Nguyen et al.	Sep 2007	A1
20070233487	Cohen et al.	Oct 2007	A1
20070233656	Bunescu et al.	Oct 2007	A1
20070233671	Oztekin et al.	Oct 2007	A1
20070239707	Collins et al.	Oct 2007	A1
20070250901	McIntire et al.	Oct 2007	A1
20070260700	Messer	Nov 2007	A1
20070271086	Peters et al.	Nov 2007	A1
20080033915	Chen et al.	Feb 2008	A1
20080046929	Cho et al.	Feb 2008	A1
20080059418	Barsness et al.	Mar 2008	A1
20080091633	Rappaport et al.	Apr 2008	A1
20080118153	Wu et al.	May 2008	A1
20080133504	Messer et al.	Jun 2008	A1
20080162533	Mount et al.	Jul 2008	A1
20080163328	Philbin et al.	Jul 2008	A1
20080168045	Suponau et al.	Jul 2008	A1
20080183681	Messer et al.	Jul 2008	A1
20080183698	Messer et al.	Jul 2008	A1
20080189110	Freeman et al.	Aug 2008	A1
20080204595	Rathod et al.	Aug 2008	A1
20080208796	Messer et al.	Aug 2008	A1
20080208839	Sheshagiri et al.	Aug 2008	A1
20080208864	Cucerzan et al.	Aug 2008	A1
20080221989	Messer et al.	Sep 2008	A1
20080222105	Matheny	Sep 2008	A1
20080222106	Rao et al.	Sep 2008	A1
20080222142	O'Donnell	Sep 2008	A1
20080235209	Rathod et al.	Sep 2008	A1
20080235393	Kunjithapatham et al.	Sep 2008	A1
20080250010	Rathod et al.	Oct 2008	A1
20080256097	Messer et al.	Oct 2008	A1
20080266449	Rathod et al.	Oct 2008	A1
20080281801	Larson et al.	Nov 2008	A1
20080288641	Messer et al.	Nov 2008	A1
20080319962	Riezler et al.	Dec 2008	A1
20090006315	Mukherjea et al.	Jan 2009	A1
20090006391	Ram	Jan 2009	A1
20090013002	Eggink et al.	Jan 2009	A1
20090025054	Gibbs et al.	Jan 2009	A1
20090055381	Wu et al.	Feb 2009	A1
20090077078	Uppala et al.	Mar 2009	A1
20090083257	Bargeron et al.	Mar 2009	A1
20090094113	Berry et al.	Apr 2009	A1
20090123021	Jung et al.	May 2009	A1
20090131028	Horodezky et al.	May 2009	A1
20090144260	Bennett et al.	Jun 2009	A1
20090144609	Liang et al.	Jun 2009	A1
20090157680	Crossley et al.	Jun 2009	A1
20090172544	Tsui et al.	Jul 2009	A1
20090198686	Cushman, II et al.	Aug 2009	A1
20090204599	Morris et al.	Aug 2009	A1
20090205018	Ferraiolo et al.	Aug 2009	A1
20090240650	Wang et al.	Sep 2009	A1
20090240674	Wilde et al.	Sep 2009	A1
20090271195	Kitade et al.	Oct 2009	A1
20090279682	Strandell et al.	Nov 2009	A1
20090282069	Callaghan et al.	Nov 2009	A1
20090326947	Arnold et al.	Dec 2009	A1
20100042602	Smyros et al.	Feb 2010	A1
20100063886	Stratton et al.	Mar 2010	A1
20100070507	Mori	Mar 2010	A1
20100094845	Moon et al.	Apr 2010	A1
20100138653	Spencer et al.	Jun 2010	A1
20100250598	Brauer et al.	Sep 2010	A1
20110004462	Houghton et al.	Jan 2011	A1
20110016106	Xia	Jan 2011	A1
20110077943	Miki et al.	Mar 2011	A1
20110125728	Smyros et al.	May 2011	A1
20110191099	Farmaner et al.	Aug 2011	A1
20110246503	Bender et al.	Oct 2011	A1
20120036119	Zwicky et al.	Feb 2012	A1
20120078932	Skurtovich, Jr. et al.	Mar 2012	A1
20120150636	Freeman et al.	Jun 2012	A1
20120191695	Xia	Jul 2012	A1
20120203708	Psota et al.	Aug 2012	A1
20130054589	Cheslow	Feb 2013	A1
20130216207	Berry et al.	Aug 2013	A1

Foreign Referenced Citations (14)

Number	Date	Country
2685833	May 2010	CA
1241587	Sep 2002	EP
1462950	Sep 2004	EP
1501305	Jan 2005	EP
2448874	Nov 2008	GB
2448875	Nov 2008	GB
9950830	Oct 1999	WO
0205135	Jan 2002	WO
2005050621	Jun 2005	WO
2006099621	Sep 2006	WO
2007115224	Oct 2007	WO
2008053132	May 2008	WO
2009052277	Apr 2009	WO
2010100853	Sep 2010	WO

Non-Patent Literature Citations (82)

Entry
Apr. 7, 2020—Canadian Office Action—CA 2,703,569.
Shahraray: “Impact and Applications of Video Content Analysis and Coding in the internet and Telecommunications”, AT&T Labs Research, A Position Statement for Panel 4: Applications the 1998 International Workshop on Very Low Bitrate Video Coding, 3 pages.
Kalina Bontcheva et al. “Shallow Methods for Named Entity Coreference Resolution”, Proc. of Taln 2002, Jan. 1, 2002.
Raphael Volz et al., “Towards ontology-based disambiguation of geographical identifiers”, Proceedings of the WWW2007 Workship I3: Identity, Identifiers, Identification, Entity-Centric Approaches to Information and Knowledge Management on the Web, Jan. 1, 2007.
Wacholder N et al., “Disambiguation of Proper Names in Text”, Proceedings of the Conference on Applied Natural Language Processing, Association Computer Linguistics, Morrisontown, NJ, Mar. 1, 2007.
Boulgouris N. V. et al., “Real-Time Compressed-Domain Spatiotemporal Segmentation and Ontologies for Video Indexing and Retrieval”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, No. 5, pp. 606-621, May 2004.
Changsheng Xu et al., “Using Webcast Text for Semantic Event Detection in Broadcast Sports Video”, IEEE Transactions on Multimedia, vol. 10, No. 7, pp. 1342-1355, Nov. 2008.
Liang Bai et al., “Video Semantic Content Analysis based on Ontology”, International Machine Vision and Image Processing Conference, pp. 117-124, Sep. 2007.
Koskela M. et al., “Measuring Concept Similarities in Multimedia Ontologies: Analysis and Evaluations”, IEEE Transactions on Multimedia, vol. 9, No. 5, pp. 912-922, Aug. 2007.
Steffen Staab et al., “Semantic Multimedia”, Reasoning Web; Lecture Notes in Computer Science, pp. 125-170, Sep. 2008.
European Search Report EP09179987.4, dated Jun. 4, 2010.
Li, Y. et al., “Reliable Video Clock Time Recognition,” Pattern Recognition, 2006, 1CPR 1006, 18th International Conference on Pattern Recognition, 4 pages.
Salton et al., Computer Evaluation of Indexing and Text Processing Journal of the Association for Computing Machinery, vol. 15, No. 1, Jan. 1968, pp. 8-36.
European Search Report for Application No. 09180776.8, dated Jun. 7, 2010, 9 pages.
European Search Report EP 09180762, dated Mar. 22, 2010.
European Application No. 09175979.5—Office Action dated Mar. 15, 2010.
EP Application No. 09 175 979.5—Office Action dated Apr. 11, 2011.
Smith, J.R. et al., “An Image and Video Search Engine for the World-Wide Web” Storage and Retrieval for Image and Video Databases 5, San Jose, Feb. 13-14, 1997, Proceedings of Spie, Belingham, Spie, US, vol. 3022, Feb. 13, 1997, pp. 84-95.
Kontothoanassis, Ledonias et al. “Design, Implementation, and Analysis of a Multimedia Indexing and Delivery Server”, Technical Report Series, Aug. 1999, Cambridge Research Laboratory.
European Patent Application No. 09175979.5—Office Action dated Dec. 13, 2011.
International Preliminary Examination Report for PCT/US01/20894, dated Feb. 4, 2002.
Towards a Multimedia World-Wide Web Information retrieval engines, Sougata Mukherjea, Kyoji Hirata, and Yoshinori Hara Computer Networks and ISDN Systems 29 (1997) 1181-1191.
Experiments in Spoken Document Retrieval at CMU, M.A. Siegler, M.J. Wittbrock, S.T. Slattery, K. Seymore, R.E. Jones, and A.G. Hauptmann, School of Computer Science Carnegie Mellon University, Pittsburgh, PA 15213-3890, Justsystem Pittsburgh Research Center, 4616 Henry Street, Pittsburgh, PA 15213.
Eberman, et al., “Indexing Multimedia for the Internet”, Compaq, Cambridge Research laboratory, Mar. 1999, pp. 1-8 and Abstract.
Ishitani, et al., “Logical Structure Analysis of Document Images Based on Emergent Computation”, IEEE Publication, pp. 189-192, Jul. 1999.
First Office Action in EP01950739.1-1244 dated Mar. 27, 2009.
Chen, “Extraction of Indicative Summary Sentences from Imaged Documents”, IEEE publication, 1997, pp. 227-232.
Messer, Alan et al., “SeeNSearch: A context Directed Search Facilitator for Home Entertainment Devices”, Paper, Samsung Information Systems America Inc., San Jose, CA, Sep. 17, 2008.
Hsin-Min Wang and Berlin Chen, “Content-based Language Models for Spoken Document Retrieval”, ACM, 2000, pp. 149-155.
Marin, Feldman, Ostendorf and Gupta, “Filtering Web Text to Match Target Genres”, International Conference on Acoustics, Speech and Signal Processing, 2009, Piscataway, NJ, Apr. 19, 2009, pp. 3705-3708.
European Search Report for application No. 10167947.0, dated Sep. 28, 2010.
Ying Zhang and Phil Vines. 2004. Using the web for automated translation extraction in cross language information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '04). ACM, New York, NY, USA, 162-169.
IPRP PCT/US2009/069644—Jun. 29, 2011.
ISR PCT/US2009/069644—Mar. 4, 2010.
ESR—EP10154725.5—Nov. 2, 2010.
ESR—EP10155340.2—Nov. 25, 2010.
Partial ESR—EP10155340.2—Jul. 12, 2010.
ESR—EP10162666.1—Aug. 4, 2011.
ESR—EP10167947.0—Sep. 28, 2010.
ISR PCT/US2001/020894—Nov. 25, 2003.
Extended European Search Report—EP 09815446.1—dated May 7, 2013.
Behrang Mohit and Rebecca Hwa, 2005. Syntax-based Semi-Supervised Named Entity Tagging. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 57-60.
Shumeet Baluja, Vibhu Mittal and Rahul Sukthankar, 1999. Applying machine learning for high performance named-entity extraction. In Proceedings of Pacific Association for Computational Linguistics.
R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL-2006, pp. 9-16.
S. Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of EMNLP-CoNLL 2007, pp. 708-716.
Radu Florian, 2002. Named entity recognition as a house of cards: Classifier stacking. In Proceedings of CoNL2002, pp. 175-178.
Martin Jansche, 2002. Named Entity Extraction with Conditional Markov Models and Classifiers. In Proceedings of CoNLL-2002.
Thamar Solorio, 2004. Improvement of Named Entity Tagging by Machine Learning. Reporte Tecnico No. CCC-04-004. INAOE.
European Examination—EP Appl. 09180762.8—dated Jan. 19, 2015.
Canadian Office Action Response—CA App 2,694,943—Filed Apr. 24, 2015.
European Office Action—EP 10154725.5—dated Apr. 24, 2015.
Chen, Langzhou, et al. “Using information retrieval methods for language model adaptation.” INTERSPEECH. 2001.
Sethy, Abhinav, Panayiotis G. Georgiou, and Shrikanth Narayanan. “Building topic specific language models from webdata using competitive models.” INTERSPEECH. 2005.
Response to European Office Action—EP Appl. 9180762.8—Submitted Jul. 29, 2015.
European Office Action—EP Appl. 10162666.1—dated Jul. 10, 2015.
Response to European Office Action—EP 10162666.1—dated Oct. 14, 2015.
Response to European Office Action—EP Appl. 10154725.5—submitted Oct. 14, 2015.
Canadian Office Action—CA Application 2,697,565—dated Dec. 15, 2015.
European Office Action—EP Appl. 09815446.1—dated Feb. 17, 2016.
Canadian Office Action—CA Appl. 2,688,921—dated Feb. 16, 2016.
Canadian Office Action—CA Appl. 2,689,376—dated Feb. 23, 2016.
Canadian Office Action—CA Appl. 2,703,569—dated Apr. 19, 2016.
Canadian Office Action—CA Appl. 2,708,842—dated May 9, 2016.
Canadian Office Action—CA Appl. 2,694,943—dated Jun. 1, 2016.
Canadian Office Action—CA App 2,695,709—dated Jun. 20, 2016.
Canadian Office Action—CA App 2,697,565—dated Dec. 28, 2016.
Canadian Office Action—CA Appl. 2,703,569—dated Feb. 8, 2017.
Mar. 21, 2017—Canadian Office Action—CA App. 2,694,943.
Canadian Office Action—CA Appl. 2,708,842—dated Apr. 12, 2017.
Arthur De Vany, W. David Walls, “Uncertainty in the Movie Industry: Does Star Power Reduce the Terror of the Box Office?,” Journal of Cultural Economics, 1999, pp. 285-318, Issue 23, Kluwer Academic Publishers, Netherlands.
Oct. 6, 2017—European Decision to Refuse—EP 09180762.8.
Oct. 25, 2017—European Decision to Refuse—EP 09815446.1.
Nov. 28, 2017—European Decision to Refuse—EP 10162666.1.
Dec. 15, 2017—Canadian Office Action—CA 2689376.
Feb. 2, 2018—Canadian Office Action—CA 2,708,842.
Feb. 15, 2018—Canadian Office Action—CA 2,697,565.
Feb. 28, 2018—Canadian Office Action—2,703,569.
Mar. 21, 2018—Canadian Office Action—CA 2,694,943.
Dec. 17, 2018—Canadian Office Action—CA 2,708,842.
Mar. 28, 2019—Canadian Office Action—CA 2,703,569.
Nov. 15, 2019—Canadian Office Action—CA 2,708,842.
Mar. 16, 2020—Canadian Office Action—CA 2,708,842.

Related Publications (1)

	Number	Date	Country
	20200312310 A1	Oct 2020	US

Continuations (2)

	Number	Date	Country
Parent	15843846	Dec 2017	US
Child	16728476		US
Parent	12496081	Jul 2009	US
Child	15843846		US

Generating topic-specific language models

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Abstract