1. Technical Field
The invention relates to speech recognition and speech directed device control. More particularly, the invention relates to a method and apparatus for the generation and augmentation of search terms from external and internal sources, in connection with speech recognition and speech directed device control.
2. Description of the Prior Art
One area of technical innovation is that of navigation of content by spoken and textual command. Such systems typically perform speech recognition by use of a grammar-based ASR (automatic speech recognition) system, where the grammar defines those terms that can be recognized. In such systems, navigated content is comprised of a catalog, content data base, or other repository, for example: currently airing broadcast TV programs, contents of a video-on-demand (VOD) system, a catalog of cell phone ring tones, a catalog of songs, or a catalog of games. Hereafter all of the above sources of content are referred to as a repository.
Content sources are updated and/or expanded on occasion, possibly periodically, possibly as frequently as daily. In some such applications as those described above, content sources are assumed, by both system architects and by system users, to reflect trends and interests in popular culture. However, known recognition systems are limited to recognition of only those phrases that are listed in grammar. Nonetheless, it is desirable to make content sources searchable by names of artists, popular topics, personalities, etc. Yet known ASR systems recognize only those elements that are listed in grammar.
It would be desirable to identify names, personalities, titles, and topics that are present in a repository, and place them into a grammar. It would also be desirable to identify names, personalities, titles, and topics that are not present in the repository, and place them into a grammar; for in this way, such names, personalities, titles and topics may at least be recognized by the ASR system, which can then report that no suitable content is present in the repository.
The presently preferred embodiment of the invention provides a method and apparatus to identify names, personalities, titles, and topics that are present in a repository. A further embodiment of the invention provides a method and apparatus to identify names, personalities, titles, and topics that are not present in the repository. A key aspect of the invention uses information from external data sources, notably non-speech, text-based searches, to expand the search terms. The expansion takes place in two forms: (1) finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are under slightly different names; and (2) expanding the existing search term list with items that should be there by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
The presently preferred embodiment of the invention provides a method and apparatus to identify names, personalities, titles, and topics that are present in a repository. A further embodiment of the invention provides a method and apparatus to identify names, personalities, titles, and topics that are not present in the repository. A key aspect of the invention uses information from external data sources, notably non-speech, text-based searches, to expand the search terms entered. The expansion takes place in two forms: (1) finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are present under slightly different names; and (2) expanding the existing search term list with items that should be there by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
An exemplary embodiment of the invention operates as follows:
First, extract search term candidates, also referred to as candidate search terms, from external sources, for instance:
1. Published lists of frequent textual searches against popular search engines, e.g. Yahoo “top searches;”
2. Published lists of popular artists and songs, e.g. music.aol.com/songs/newsongs “Top 100 Songs;”
3. Published lists of popular tags, e.g. ETonline.com “top tags;”
4. Published lists of most-emailed stories, e.g. NYtimes.com most emailed stories, ETonline.com most emailed stories; and
5. Published news feeds, such as RSS feeds, e.g. NYtimes.com/rss.
Nominally for the first three sources listed above, the candidate search terms are clearly identified as an explicitly marked title, author, artist name, etc. and, hence, processing is purely automatic. For the final two sources listed above, a combination of automatic means, such as named entity extraction (NEE) and/or topic detection and tracking (TDT) methods, and possibly direct human intervention, are applied to the running text or titles to generate candidate search terms. However, human intervention may be used with the first group as well.
Next, extract verified search terms from internal sources, for instance:
1. Explicitly marked titles, authors, artist names, etc. that are associated to the content elements in the repository; and/or
2. Sources derived by application of named entity extraction (NEE) and/or topic detection and tracking (TDT) methods to descriptive text associated to the content elements in the repository.
In the presently preferred embodiment of the invention, typical (although not exclusive) means of NEE and TDT analysis may be found in:
Next, match candidate search terms against verified search terms by well-known linguistic edit distance techniques, to obtain plausible linguistic variants of verified search terms, used to generate the augmented verified search terms.
Finally, by virtue of their high incidence count, repeated appearance in history as either a candidate or verified search term, or other criterion, include in the candidate search terms which do not point to actual content elements, but which the ASR system should nevertheless recognize. We refer to such elements as “null search terms.”
In
External sources comprise, for example, explicitly marked information 12 and running text 15. Explicitly marked text may be subject to an optional count filtering process 14, providing incidence count information is available, whereby only those instances with sufficiently high incidence count are retained, while running text is processed, as discussed above, with a module 17 that performs, for example, named entity extraction (NEE) or topic detection and tracking (TDT). The data from all external sources is combined by a module 18 and an output, comprising candidate search terms (C[i]) 19 is generated. The combined output from external sources is further processed by a module 22 that performs such functions as incidence counting, low pass filtering, and other functions as desired, and is also passed to an approximate text matching module 33 (discussed below). This module 22 also receives historical information, such as a history of candidate search terms (C[i−1] . . . ) 20, a history of final search terms (S[i−1] . . . ) 21, and verified search terms (discussed in greater detail below). The output of the module 22 is provided to a further module 23, which identifies null search terms (N[i]), as discussed above.
Internal sources comprise, for example, explicitly marked information 27 and running text 28. Explicitly marked text may be subject to an optional count filtering process 29, whereby only those instances with sufficiently high incidence count are retained, while running text is processed, as discussed above, with a module 30 that performs, for example, named entity extraction (NEE) or topic detection and tracking (TDT). The data from all internal sources is combined by a module 31 and an output, comprising verified search terms (V[i]) 32 is generated. The verified search terms are used in connection with the module 22, as discussed above. The verified search terms are also provided to a module 33 for approximate text matching by linguistic edit distance techniques. The module 33 also receives candidate search terms from the module 19 as an input. The output of the module 33 is provided to a module 34 that generates augmented verified search terms (AV[i]).
The processed external sources information that is output by the module 23 and the processed internal sources information that is output by the module 34 are provided as inputs to a combining module 34 to produce final search terms (S[i]) 25, which are output.
Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below.
This application is a continuation of U.S. patent application Ser. No. 11/930,951, filed Oct. 31, 2007, which is a divisional application of U.S. patent application Ser. No. 10/699,543, filed Oct. 30, 2003, which claims priority to U.S. provisional patent application Ser. No. 60/422,561, filed Oct. 31, 2002, each of which is incorporated herein in its entirety by this reference thereto.
Number | Name | Date | Kind |
---|---|---|---|
5553119 | Mcallister et al. | Sep 1996 | A |
5581655 | Cohen et al. | Dec 1996 | A |
5611019 | Nakatoh et al. | Mar 1997 | A |
5698834 | Worthington et al. | Dec 1997 | A |
5737723 | Riley et al. | Apr 1998 | A |
5752232 | Basore et al. | May 1998 | A |
5774859 | Houser et al. | Jun 1998 | A |
5963903 | Hon et al. | Oct 1999 | A |
6009387 | Ramaswamy et al. | Dec 1999 | A |
6012058 | Fayyad et al. | Jan 2000 | A |
6021387 | Mozer et al. | Feb 2000 | A |
6130726 | Darbee et al. | Oct 2000 | A |
6141640 | Moo | Oct 2000 | A |
6182039 | Rigazio et al. | Jan 2001 | B1 |
6260013 | Sejnoha | Jul 2001 | B1 |
6263308 | Heckerman et al. | Jul 2001 | B1 |
6298324 | Zuberec et al. | Oct 2001 | B1 |
6301560 | Masters | Oct 2001 | B1 |
6320947 | Joyce et al. | Nov 2001 | B1 |
6336091 | Polikaitis et al. | Jan 2002 | B1 |
6374177 | Lee et al. | Apr 2002 | B1 |
6374226 | Hunt et al. | Apr 2002 | B1 |
6381316 | Joyce et al. | Apr 2002 | B2 |
6408272 | White et al. | Jun 2002 | B1 |
6415257 | Junqua et al. | Jul 2002 | B1 |
6424935 | Taylor | Jul 2002 | B1 |
6446035 | Grefenstette et al. | Sep 2002 | B1 |
6658414 | Bryan et al. | Dec 2003 | B2 |
6665644 | Kanevsky et al. | Dec 2003 | B1 |
6711541 | Kuhn et al. | Mar 2004 | B1 |
6711543 | Cameron | Mar 2004 | B2 |
6714632 | Joyce et al. | Mar 2004 | B2 |
6721633 | Funk et al. | Apr 2004 | B2 |
6725022 | Clayton et al. | Apr 2004 | B1 |
6728531 | Lee et al. | Apr 2004 | B1 |
6799201 | Lee et al. | Sep 2004 | B1 |
6804653 | Gabel | Oct 2004 | B2 |
6892083 | Shostak | May 2005 | B2 |
6901366 | Kuhn et al. | May 2005 | B1 |
6975993 | Keiller | Dec 2005 | B1 |
6985865 | Packingham et al. | Jan 2006 | B1 |
7020609 | Thrift et al. | Mar 2006 | B2 |
7027987 | Franz et al. | Apr 2006 | B1 |
7062477 | Fujiwara et al. | Jun 2006 | B2 |
7113981 | Slate | Sep 2006 | B2 |
7117159 | Packingham et al. | Oct 2006 | B1 |
7158959 | Chickering et al. | Jan 2007 | B1 |
7188066 | Falcon et al. | Mar 2007 | B2 |
7203645 | Pokhariyal et al. | Apr 2007 | B2 |
7231380 | Pienkos | Jun 2007 | B1 |
7263489 | Cohen et al. | Aug 2007 | B2 |
7324947 | Jordan et al. | Jan 2008 | B2 |
7428555 | Yan | Sep 2008 | B2 |
7447636 | Schwartz et al. | Nov 2008 | B1 |
7483885 | Chandrasekar et al. | Jan 2009 | B2 |
7519534 | Maddux et al. | Apr 2009 | B2 |
7654455 | Bhatti et al. | Feb 2010 | B1 |
7769786 | Patel | Aug 2010 | B2 |
7809601 | Shaya et al. | Oct 2010 | B2 |
7904296 | Morris | Mar 2011 | B2 |
7934658 | Bhatti et al. | May 2011 | B1 |
7949526 | Ju et al. | May 2011 | B2 |
8165916 | Hoffberg et al. | Apr 2012 | B2 |
8321278 | Haveliwala et al. | Nov 2012 | B2 |
8321427 | Stampleman et al. | Nov 2012 | B2 |
8515753 | Kim et al. | Aug 2013 | B2 |
20010019604 | Joyce et al. | Sep 2001 | A1 |
20020015480 | Daswani et al. | Feb 2002 | A1 |
20020032549 | Axelrod et al. | Mar 2002 | A1 |
20020032564 | Ehsani et al. | Mar 2002 | A1 |
20020046030 | Haritsa et al. | Apr 2002 | A1 |
20020049535 | Rigo et al. | Apr 2002 | A1 |
20020106065 | Joyce et al. | Aug 2002 | A1 |
20020146015 | Bryan et al. | Oct 2002 | A1 |
20030004728 | Keiller | Jan 2003 | A1 |
20030028380 | Freeland et al. | Feb 2003 | A1 |
20030033152 | Cameron | Feb 2003 | A1 |
20030061039 | Levin | Mar 2003 | A1 |
20030065427 | Funk et al. | Apr 2003 | A1 |
20030068154 | Zylka et al. | Apr 2003 | A1 |
20030069729 | Bickley et al. | Apr 2003 | A1 |
20030073434 | Shostak | Apr 2003 | A1 |
20030093281 | Geilhufe et al. | May 2003 | A1 |
20030125928 | Lee et al. | Jul 2003 | A1 |
20030177013 | Falcon et al. | Sep 2003 | A1 |
20030212702 | Campos et al. | Nov 2003 | A1 |
20040077334 | Joyce et al. | Apr 2004 | A1 |
20040110472 | Witkowski et al. | Jun 2004 | A1 |
20040127241 | Shostak | Jul 2004 | A1 |
20040132433 | Stern et al. | Jul 2004 | A1 |
20040199498 | Kapur et al. | Oct 2004 | A1 |
20050010412 | Aronowitz | Jan 2005 | A1 |
20050071224 | Fikes et al. | Mar 2005 | A1 |
20050125224 | Myers et al. | Jun 2005 | A1 |
20050143139 | Park et al. | Jun 2005 | A1 |
20050144251 | Slate | Jun 2005 | A1 |
20050170863 | Shostak | Aug 2005 | A1 |
20050228670 | Mahajan et al. | Oct 2005 | A1 |
20060018440 | Watkins et al. | Jan 2006 | A1 |
20060028337 | Li | Feb 2006 | A1 |
20060050686 | Velez et al. | Mar 2006 | A1 |
20060064177 | Tian et al. | Mar 2006 | A1 |
20060085521 | Sztybel | Apr 2006 | A1 |
20060136292 | Bhati et al. | Jun 2006 | A1 |
20060149635 | Bhatti et al. | Jul 2006 | A1 |
20060206339 | Silvera et al. | Sep 2006 | A1 |
20060206340 | Silvera et al. | Sep 2006 | A1 |
20060259467 | Westphal | Nov 2006 | A1 |
20060271546 | Phung | Nov 2006 | A1 |
20070027864 | Collins et al. | Feb 2007 | A1 |
20070033003 | Morris | Feb 2007 | A1 |
20070067285 | Blume et al. | Mar 2007 | A1 |
20080021860 | Wiegering et al. | Jan 2008 | A1 |
20080103887 | Oldham et al. | May 2008 | A1 |
20080103907 | Maislos et al. | May 2008 | A1 |
20080250448 | Rowe et al. | Oct 2008 | A1 |
20090048910 | Shenfield et al. | Feb 2009 | A1 |
Number | Date | Country |
---|---|---|
1341363 | Sep 2003 | EP |
1003018 | May 2005 | EP |
1633150 | Mar 2006 | EP |
1633151 | Mar 2006 | EP |
1742437 | Jan 2007 | EP |
WO-0016568 | Mar 2000 | WO |
WO-0021232 | Apr 2000 | WO |
WO-0122112 | Mar 2001 | WO |
WO-0122249 | Mar 2001 | WO |
WO-0122633 | Mar 2001 | WO |
WO-0122712 | Mar 2001 | WO |
WO-0122713 | Mar 2001 | WO |
WO-0139178 | May 2001 | WO |
WO-0157851 | Aug 2001 | WO |
WO-0207050 | Jan 2002 | WO |
WO-0211120 | Feb 2002 | WO |
WO-0217090 | Feb 2002 | WO |
WO-02097590 | Dec 2002 | WO |
WO-2004077721 | Sep 2004 | WO |
WO-2006033841 | Mar 2006 | WO |
WO-2006098789 | Sep 2006 | WO |
WO-2004021149 | Mar 2007 | WO |
WO-2005079254 | May 2007 | WO |
WO-2006029269 | May 2007 | WO |
Entry |
---|
Amir, A. et al., “Advances in Phonetic Word Spotting”, IBM Research Report RJ 10215, Aug. 2001, pp. 1-3. |
Belzer, et al., “Symmetric Trellis-Coded Vector Quantization”, IEEE Transactions on Communications, IEEE Service Center, Piscataway, NJ, vol. 45, No. 45, par. II, figure 2, Nov. 1997, pp. 1354-1357. |
Chan, et al., “Efficient Codebook Search Procedure for Vector-Sum Excited Linear Predictive Coding of Speech”, IEEE Electronics Letters; vol. 30, No. 22; Stevanage, GB, ISSN 0013-5194, Oct. 27, 1994, pp. 1830-1831. |
Chan, , “Fast Stochastic Codebook Search Through the Use of Odd-Symmetric Crosscorrelation Basis Vectors”, Int'l Conference on Acoustics, Speech and Signal Processing; Detroit, Michigan, vol. 1, Par. 1; ISBN 0-7803-2461-5, May 1995, pp. 21-24. |
Chen, et al., “Diagonal Axes method (DAM): A Fast Search Algorithm for Vector Quantization”, IEEE Transactions on Circuits and Systems for Video Technology, Piscataway, NJ; vol. 7, No. 3, ISSN 1051-8215; Par. I, II, Jun. 1997. |
Hanzo, et al., “Voice Compression and Communications—Principles and Applications for Fixed and Wireless Channels”, Wiley, ISBN 0-471-15039-8; par. 4.3.3, 2001. |
Salami, et al., “A Fully Vector Quantised Self-Excited Vocoder”, Int'l Conference on Acoustics, Speech & Signal Processing; vol. 1, par. 3.1; Glasgow, May 1989. |
Schotz, S. , “Automatic prediction of speaker age using CART”, Course paper for course in Speech Recognition, Lund University, retrieved online from url: http://person2.sol.lu.se/SusznneSchotz/downloads/SR—paper—SusanneS2004.pdf, 2003, 8 pages. |
Number | Date | Country | |
---|---|---|---|
20130060789 A1 | Mar 2013 | US |
Number | Date | Country | |
---|---|---|---|
60422561 | Oct 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10699543 | Oct 2003 | US |
Child | 11930951 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11930951 | Oct 2007 | US |
Child | 13667446 | US |