SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

Information

NSF Award
0548763

Owner

FLUENTIAL, LLC

Award Id
0548763
Award Effective Date
1/15/2006 - 19 years ago
Award Expiration Date
12/31/2008 - 16 years ago
Award Amount
$ 754,998.00
Award Instrument
Standard Grant

Information

SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

This Small Business Innovation Research (SBIR) Phase II project embodies an innovative approach to machine translation. The proposed model aims to overcome two important bottlenecks in the development of a high quality statistical machine translation (SMT) system: (1) inability to handle structural problems and (2) dependence on huge amounts of parallel texts. The inability of statistics to sufficiently handle grammatical problems such as word order becomes more evident when the language pair is very different in structure and morphology, such as with English and Korean. The dependence on a huge amount of parallel texts is a great challenge especially to speech translation. Based on successful tests in the Phase I project, this project proposes a method to learn linguistic knowledge crucial to handling word order and non-local dependencies automatically from input and incorporate it into SMT along with simple transformations, maximizing the strength of both knowledge-based approaches and statistical approaches, and minimizing the need for ever-increasing amounts of bilingual data. The proposed approach aims to build a syntactic-phrase-based statistical machine translation engine that not only is more accurate than the existing word-based ones, but also can decrease the need for large data sources.<br/><br/>The primary impact of the proposed project is the potential for achieving automatic translation quality as high as the quality of the best knowledge-based machine translation engines; but with a minimum of handcrafting of knowledge and therefore at a much lower cost in terms of development time and human resources. While the research is specifically concerned with MT between English and Korean, the resulting translation models would potentially be usable for translation between any pair of languages. The result of the research will be used to develop a speech translation device, in particular to overcome language barriers in communication with patients in hospitals. It will provide a key technology that will accelerate development of speech translation applications in order to reduce costs of healthcare providers and to enhance the quality of healthcare. Additionally, the proposed method of learning linguistic features will have an impact on many different applications including speech recognition, search engines, genre and topic detection, and document search and query. Finally, the proposed research will have beneficial impacts nationally and globally by helping to solve the 'automatic translation' problem, an area of paramount importance to the economic welfare and security of the United States and the rest of the world.

Program Officer
Ian M. Bennett
Min Amd Letter Date
1/11/2006 - 19 years ago
Max Amd Letter Date
12/6/2007 - 17 years ago
ARRA Amount

Institutions

Name
Fluential , Inc.
City
Sunnyvale
State
CA
Country
United States
Address
1153 Bordeaux Drive, Suite 211
Postal Code
940891224
Phone Number
4087471010

Investigators

First Name
Farzad
Last Name
Ehsani
Email Address
farzad@fluentialinc.com
Start Date
10/17/2007 12:00:00 AM

First Name
Yookyung
Last Name
Kim
Email Address
kim@sehda.com
Start Date
1/11/2006 12:00:00 AM
End Date
10/17/2007

FOA Information

Name
Computer Science
Code
912

SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

End Date

FOA Information

Name

Code