Interpreters are essential for languages translations when people communicate with each other using different languages; however, the cost to hire an interpreter is high and interpreters are not always available. Thus a mobile machine language translator is needed. Having a mobile machine language translator will be useful and economically effective in many circumstances, such as, a tourist visits a foreign place speaking different language or a business meeting between people speaking different languages. Although a two-way spoken language translator is used as the example to explain the design of the invention in this application, however, the same design principle can be used for any recording or communication device to achieve a good signal-to-noise ratio (SNR).
The current commercial available mobile language translation devices are one-way fixed-phrase translation where the device translates one's speech into another person's language, but not vice verse. Examples are the Phraselator® from Voxtec Inc. and Patent Application Number 03058606. One-way spoken language translation has limited the scope and capacity of the communication between speaker one and speaker two. Therefore, it is desirable to have a more effective device capable of translating simultaneously between two or more speakers using different languages.
Facilitated by a multi-directional microphone array, the present invention is capable of translating one person's speech of one language into another language ether in the form of text or speech for another person, and vise versa. Referring to
In addition, a language translation system that is capable of translating language one into language two and translating language two into language one 120, a speech synthesizer that is capable of synthesizing speeches from the text of language one and from the text of language two 118; one or two displaying devices 114, 124 that are capable of displaying relevant text on screen 220; and one or two loudspeakers 116, 122 that are capable of playing out the synthesized speeches. The present invention is superior to the prior art for the following reasons:
Other objects, features, and advantages of the present invention will become apparent from the following detailed description of the preferred but non-limiting embodiment. The description is made with reference to the accompanying drawings in which:
In one embodiment of the present invention, the microphone components can be placed in a 3-D space, and those components can form any 3-D shapes inside or outside an mobile computation device. Or, one microphone array can be mounted on the front side of a mobile computation device 200 while another microphone array can be mounted on the back of the computation device 210. A microphone array algorithm can be linear or non-linear. Two fixed patterns of beams computed by the algorithm, as shown in
Furthermore, the language translation system 120 will then convert the text of language one into text of language two which can be displayed on the screen 124 or fed the text into the speech synthesizer 118 to convert the text of language two into speech of language two. After speaker two receives the converted linguistic information from speaker one, speaker two could talk back to speaker one in language two. The microphone array number two will capture speaker two's speech through a fixed acoustic beam. Similarly, the signal pre-processor 110 will convert the speech of language two into digital signal whose noise will be further suppressed, then passed to the automatic speech recognizer 112. The speech recognizer will convert the speech of language two into text of language two. The language translation system 120 will then convert the text of language two into text of language one which can be displayed on the screen 114 or fed into the speech synthesizer 118 to convert the text of language one into speech number one. By this way, two persons speaking different languages can communication with each other face-to-face in real time.
In another embodiment of the present invention when speaker one and/or speaker two move while talking, as shown in
In yet another embodiment of the present invention when multiple parties are involved in the communication, acoustic beams can be configured to form in real time to focus on the current speaker, as in
The bi-directional microphone array can be formed by two set of beam forming parameters, as shown in
Traditionally, the sound direction is computed with a linear time-delay system, as in
In order to reduce the geometric sized of a microphone array without reducing the beam forming performance, this invention increased the sampling rate during the beam forming computation. The sampling rate of the output of the microphone array can be reduced to the required rate. For example, a system need only 8 KHz sampling rate, but, in order to reduce the size of the microphone array, we increase the rate to 32 KHz, 44 KHz, or even higher. After the beam forming computation, we reduce the sampling rate to 8 KHz.
The invention also has the feature to have the speech generated from the text-to-speech synthesizer sound like the voice of the current speaker. For example, after speaker one talks in one language, the system translates speaker one's speech into another language, and then plays by a loudspeaker through a text-to-speech (TTS) system. The invention can have the sound of the translated speech like speaker one. This can be implemented by first estimating and saving speaker one's speech characteristics, such as speaker one's pitch and timbre, by a signal processing algorithm, and then use the saved pitch and timbre in the synthesized speech.
Alternatively, the present system can be implemented on any computation device including computers, personal computers, PDA, laptop personal computer or wireless telephone handsets. The communication mode can be face-to-face or remote through analog, digital, or IP-based network. There are many alternative ways that the invention can be used, including but not exclusive:
As a translator for any personnel spoken any language;
As a translator for any personnel in foreign countries;
As a translator for international tourists;
As a translator for international business conference and negotiation.
Although the present invention has been fully described in connection with the preferred embodiments thereof with reference to the accompanying drawings, it is to be noted that various changes and modifications are apparent to those skilled in the art. Such changes and modifications are to be understood as included within the scope of the present invention as defined by the appended claims unless they depart therefrom.
This application claims priority from U.S. Provisional Patent Application No. USPTO 60/684061, filed on May 24, 2005.
Number | Date | Country | |
---|---|---|---|
60684061 | May 2005 | US |