Speech translation systems are known in which a spoken utterance is converted to text using an automatic speech recognition or ASR system. This recognized speech is then translated using a machine translation “MT” system into the target language text. The target language text is subsequently re synthesized using a text to speech synthesizer.
The present application defines determining additional information from speech beyond the conventional text information.
These and other aspects will now be described in detail with reference to the accompanying drawings, wherein:
The operation can be carried out by a programmed computer that runs the flowcharts described herein. The computer can be as shown in
Additional information extracted by the speech channel can be used to produce additional information from the translation process. The additional information can include keywords, prominence information, emotional information, and class descriptors, as well as other prosodic information which is often ignored in a speech to text conversion and in the ensuing text-to-text conversion.
In
Similarly, prominence information can indicate emphasis or the like by its words, or by emphasis in the sentence that indicates some kind of emphasized statement.
Emotional words may include words that indicate the user's state of mind, such as profanities, words like “upset”, and other keywords that can be used to train the system. The emotions may also be determined from the cadence of the speech that is being recognized. For example a filter may be trained to recognize emotional type talking such as whining, crying, or screaming.
These and other words that recognize descriptors of information in the text become descriptors 200. These accompany the text, and form a feature rich statistical machine translation result 230, which may be, for example, a training corpus.
The meta information is preferably extracted from real audio, and not from the transcripts. This allows the emotion, the emphasis, and other information to be obtained. This training and subsequent translation may be expensive way in terms of computer resources.
This produces a lattice of translated information in the target language at 335, which are presented along with the descriptors at 340. 345 illustrates using a lattice rescoring operation to merge the two information channels.
The above describes training and translating, however it should be understood that this system can be applied to either or both of training and/or translating the using the meta information.
The general structure and techniques, and more specific embodiments which can be used to effect different ways of carrying out the more general goals are described herein.
Although only a few embodiments have been disclosed in detail above, other embodiments are possible and the inventor (s) intend these to be encompassed within this specification. The specification describes specific examples to accomplish a more general goal that may be accomplished in another way. This disclosure is intended to be exemplary, and the claims are intended to cover any modification or alternative which might be predictable to a person having ordinary skill in the art. For example, this can be used for speech recognition and/or speech translation, or training of such a system, or for any subset or superset thereof.
Also, the inventor(s) intend that only those claims which use the words “means for” are intended to be interpreted under 35 USC 112, sixth paragraph. Moreover, no limitations from the specification are intended to be read into any claims, unless those limitations are expressly included in the claims. The computers described herein may be any kind of computer, either general purpose, or some specific purpose computer such as a workstation. The computer may be a Pentium class computer, running Windows XP or Linux, or may be a Macintosh computer. The computer may also be a handheld computer, such as a PDA, cellphone, or laptop.
The programs may be written in C, or Java, Brew or any other programming language. The programs may be resident on a storage medium, e.g., magnetic or optical, e.g. the computer hard drive, a removable disk or media such as a memory stick or SD media, or other removable medium. The programs may also be run over a network, for example, with a server or other machine sending signals to the local machine, which allows the local machine to carry out the operations described herein.
Where a specific numerical value is mentioned herein, it should be considered that the value may be increased or decreased by 20%, while still staying within the teachings of the present application, unless some different range is specifically mentioned.
This application claims priority to U.S. Provisional Application 60/803,220, filed May 25, 2006. The disclosure of the prior application is considered part of (and is incorporated by reference in) the disclosure of this application.
The U.S. Government may have certain rights in this invention pursuant to Grant No. N66001-02-C-6023 awarded by DARPA/SPAWAR.
Number | Name | Date | Kind |
---|---|---|---|
2177790 | Scott | Oct 1939 | A |
2674923 | William | Apr 1954 | A |
4067122 | Fernandez et al. | Jan 1978 | A |
4419080 | Erwin | Dec 1983 | A |
4604698 | Ikemoto et al. | Aug 1986 | A |
4658374 | Tanimoto et al. | Apr 1987 | A |
5161105 | Kugimiya et al. | Nov 1992 | A |
5201042 | Weisner et al. | Apr 1993 | A |
5576953 | Hugentobler | Nov 1996 | A |
5678001 | Nagel et al. | Oct 1997 | A |
5697789 | Sameth et al. | Dec 1997 | A |
5741136 | Kirksey et al. | Apr 1998 | A |
5760788 | Chainini et al. | Jun 1998 | A |
5799267 | Siegel | Aug 1998 | A |
5855000 | Waibel et al. | Dec 1998 | A |
5882202 | Sameth et al. | Mar 1999 | A |
5991594 | Froeber et al. | Nov 1999 | A |
5991711 | Seno et al. | Nov 1999 | A |
6073146 | Chen | Jun 2000 | A |
6243675 | Ito | Jun 2001 | B1 |
6339754 | Flanagan et al. | Jan 2002 | B1 |
6374224 | Horiguchi et al. | Apr 2002 | B1 |
6394899 | Walker | May 2002 | B1 |
6669562 | Shiino | Dec 2003 | B1 |
6755657 | Wasowicz | Jun 2004 | B1 |
6859778 | Bakis et al. | Feb 2005 | B1 |
6866510 | Polanyi et al. | Mar 2005 | B2 |
6970821 | Shambaugh et al. | Nov 2005 | B1 |
7016829 | Brill et al. | Mar 2006 | B2 |
7155382 | Boys | Dec 2006 | B2 |
7238024 | Rehbein et al. | Jul 2007 | B2 |
7409348 | Wen et al. | Aug 2008 | B2 |
7461001 | Liqin et al. | Dec 2008 | B2 |
7689407 | Yang et al. | Mar 2010 | B2 |
7689422 | Eves et al. | Mar 2010 | B2 |
20020059056 | Appleby | May 2002 | A1 |
20020095281 | Cox et al. | Jul 2002 | A1 |
20020184002 | Galli | Dec 2002 | A1 |
20040083111 | Rehbein et al. | Apr 2004 | A1 |
20040210923 | Hudgeons et al. | Oct 2004 | A1 |
20040248068 | Davidovich | Dec 2004 | A1 |
20050014563 | Barri | Jan 2005 | A1 |
20050084829 | Peters | Apr 2005 | A1 |
20050165645 | Kirwin | Jul 2005 | A1 |
20050216256 | Lueck | Sep 2005 | A1 |
20060212288 | Sethy et al. | Sep 2006 | A1 |
20060293874 | Zhang et al. | Dec 2006 | A1 |
20070015121 | Johnson et al. | Jan 2007 | A1 |
20070208569 | Subramanian et al. | Sep 2007 | A1 |
20070294077 | Narayanan et al. | Dec 2007 | A1 |
20080003551 | Narayanan et al. | Jan 2008 | A1 |
20080040095 | Sinha et al. | Feb 2008 | A1 |
20080071518 | Narayanan et al. | Mar 2008 | A1 |
20080255824 | Aso | Oct 2008 | A1 |
20080268955 | Spittle | Oct 2008 | A1 |
20090106016 | Athsani et al. | Apr 2009 | A1 |
20100009321 | Purushotma | Jan 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
20080065368 A1 | Mar 2008 | US |
Number | Date | Country | |
---|---|---|---|
60803220 | May 2006 | US |