SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

Information

  • NSF Award
  • 0548763
Owner
  • Award Id
    0548763
  • Award Effective Date
    1/15/2006 - 19 years ago
  • Award Expiration Date
    12/31/2008 - 16 years ago
  • Award Amount
    $ 754,998.00
  • Award Instrument
    Standard Grant

SBIR Phase II: Incorporation of Knowledge Base into Statistical Machine Translation

This Small Business Innovation Research (SBIR) Phase II project embodies an innovative approach to machine translation. The proposed model aims to overcome two important bottlenecks in the development of a high quality statistical machine translation (SMT) system: (1) inability to handle structural problems and (2) dependence on huge amounts of parallel texts. The inability of statistics to sufficiently handle grammatical problems such as word order becomes more evident when the language pair is very different in structure and morphology, such as with English and Korean. The dependence on a huge amount of parallel texts is a great challenge especially to speech translation. Based on successful tests in the Phase I project, this project proposes a method to learn linguistic knowledge crucial to handling word order and non-local dependencies automatically from input and incorporate it into SMT along with simple transformations, maximizing the strength of both knowledge-based approaches and statistical approaches, and minimizing the need for ever-increasing amounts of bilingual data. The proposed approach aims to build a syntactic-phrase-based statistical machine translation engine that not only is more accurate than the existing word-based ones, but also can decrease the need for large data sources.<br/><br/>The primary impact of the proposed project is the potential for achieving automatic translation quality as high as the quality of the best knowledge-based machine translation engines; but with a minimum of handcrafting of knowledge and therefore at a much lower cost in terms of development time and human resources. While the research is specifically concerned with MT between English and Korean, the resulting translation models would potentially be usable for translation between any pair of languages. The result of the research will be used to develop a speech translation device, in particular to overcome language barriers in communication with patients in hospitals. It will provide a key technology that will accelerate development of speech translation applications in order to reduce costs of healthcare providers and to enhance the quality of healthcare. Additionally, the proposed method of learning linguistic features will have an impact on many different applications including speech recognition, search engines, genre and topic detection, and document search and query. Finally, the proposed research will have beneficial impacts nationally and globally by helping to solve the 'automatic translation' problem, an area of paramount importance to the economic welfare and security of the United States and the rest of the world.

  • Program Officer
    Ian M. Bennett
  • Min Amd Letter Date
    1/11/2006 - 19 years ago
  • Max Amd Letter Date
    12/6/2007 - 17 years ago
  • ARRA Amount

Institutions

  • Name
    Fluential , Inc.
  • City
    Sunnyvale
  • State
    CA
  • Country
    United States
  • Address
    1153 Bordeaux Drive, Suite 211
  • Postal Code
    940891224
  • Phone Number
    4087471010

Investigators

  • First Name
    Farzad
  • Last Name
    Ehsani
  • Email Address
    farzad@fluentialinc.com
  • Start Date
    10/17/2007 12:00:00 AM
  • First Name
    Yookyung
  • Last Name
    Kim
  • Email Address
    kim@sehda.com
  • Start Date
    1/11/2006 12:00:00 AM
  • End Date
    10/17/2007

FOA Information

  • Name
    Computer Science
  • Code
    912