SBIR Phase I: Incorporation of Knowledge Base into Statistical Machine Translation

Information

  • NSF Award
  • 0441891
Owner
  • Award Id
    0441891
  • Award Effective Date
    1/1/2005 - 20 years ago
  • Award Expiration Date
    6/30/2005 - 19 years ago
  • Award Amount
    $ 100,000.00
  • Award Instrument
    Standard Grant

SBIR Phase I: Incorporation of Knowledge Base into Statistical Machine Translation

This Small Business Innovation Research (SBIR) Phase I project proffers an innovative approach to machine translation. The project model aims to overcome two important bottlenecks in the development of a high quality Statistical Machine Translation (SMT) system: (1) the inability to handle structural problems, and (2) dependence on huge amounts of parallel texts. The inability of statistics to sufficiently handle grammatical problems such as word order becomes more evident when the language pair is very different in structure and morphology, such as with English and Korean. This project is a method to learn linguistic knowledge crucial to handling word order and nonlocal dependencies automatically from text and incorporate it into SMT along with simple transformations, maximizing the strength of both knowledge-based approaches and statistical approaches, and minimizing the need for ever-increasing amounts of bilingual data. This approach aims to build a syntactic-phrase-based Statistical Machine Translation engine that is not only more accurate than the existing word-based ones but is also capable of decreasing the need for large data sources. The primary impact of the project is the potential for achieving automatic translation quality, which is as high as the quality of the best knowledge-based machine translation engines but which, at the same time, requires a minimum of handcrafting of knowledge and is therefore much lower cost in terms of development time and human resources.<br/><br/>While the research is specifically concerned with MT between English and Korean, the resulting translation models would potentially be usable for translation between any pair of languages. In addition to benefiting machine translation research and applications directly, the research will provide significant progress towards building bilingual phrase lexicons from data, which in turn will aid in multi-lingual tasks such as cross-lingual information retrieval. Sehda's syntactic phrase based MT engine can produce unambiguous phrase translations, useful for indexing foreign documents and constructing keyword lists for document summary. Additionally, the project's method to learn features to augment traditional language modeling will have an impact in many different applications including speech recognition, search engines, genre and topic detection, and document search and query. Lastly, this research has beneficial impacts nationally and globally by helping to solve the "automatic translation" problem, an area of paramount importance to the economic welfare and security of the US, as well as to the rest of the world.

  • Program Officer
    Ian M. Bennett
  • Min Amd Letter Date
    11/9/2004 - 20 years ago
  • Max Amd Letter Date
    11/9/2004 - 20 years ago
  • ARRA Amount

Institutions

  • Name
    Fluential , Inc.
  • City
    Sunnyvale
  • State
    CA
  • Country
    United States
  • Address
    1153 Bordeaux Drive, Suite 211
  • Postal Code
    940891224
  • Phone Number
    4087471010

Investigators

  • First Name
    Yookyung
  • Last Name
    Kim
  • Email Address
    kim@sehda.com
  • Start Date
    11/9/2004 12:00:00 AM

FOA Information

  • Name
    Computer Science
  • Code
    912