Collaborative Research: Structure Alignment-based Machine Translation

Information

  • NSF Award
  • 0534325
Owner
  • Award Id
    0534325
  • Award Effective Date
    7/1/2006 - 18 years ago
  • Award Expiration Date
    6/30/2010 - 14 years ago
  • Award Amount
    $ 82,000.00
  • Award Instrument
    Continuing grant

Collaborative Research: Structure Alignment-based Machine Translation

Researchers at New York University, Monmouth University and the<br/>University of Colorado are constructing Japanese/English and<br/>Chinese/English machine translation systems which automatically acquire<br/>rules from ``deep'' linguistic analyses of parallel text. This work<br/>is a natural culmination of automated example-based Machine<br/>Translation (MT) projects that have become increasingly sophisticated<br/>over the last two decades. The following recent advances in Natural<br/>Language Processing (NLP) technologies make this inquiry feasible: (1)<br/>annotated data including bilingual treebanks and processors trained on<br/>this data (parsers, PropBankers, etc.); (2) semantic post-processors<br/>of parser output; (3) programs that automatically align bitexts; and<br/>(4) bilingual tree to tree translation models.<br/><br/>Natural languages vary widely in the ordering of corresponding words<br/>for equivalent expressions across linguistic boundaries and within a<br/>single language. This research investigates ways to minimize the<br/>variations within a single language using a type of semantic<br/>representation (GLARF) that is derived automatically from syntactic<br/>trees. Such semantic representation provides for: (1) a reduction in the<br/>number of ways of representing the same underlying message, and (2)<br/>a way to handle long distance dependencies (e.g. relative<br/>clauses) as local phenomena. Therefore, there is no need to resort to<br/>arbitrarily long sentence fragments or large trees for<br/>training. Furthermore, since less data is needed, it<br/>minimizes the sparse data problem.<br/><br/>In the training of this translation model, because of (1), the number<br/>of mapping rules between the source tree and the target tree is<br/>reduced. The translation model, then, is a tree transducer, with<br/>``deep'' linguistically analyzed trees for both source and target<br/>representations. In order to provide efficient computer algorithms<br/>for such partial mappings, this research needs to focus on<br/>(a) the training algorithm and the (b) the constraints over the<br/>mapping rules in order to reduce the computational complexity.<br/><br/>This research is expected to yield several advantages: The core<br/>architecture of this transducer using ``deep'' linguistic analyses<br/>should yield more accurate results. The GLARF architecture allows<br/>control over different granularity of automatically-obtained<br/>linguistic analyses.<br/><br/>Broader Impact: The demand for machine translation spans from the<br/>local government (e.g. police forces) to national government<br/>(e.g. CIA) and the private sector. Given the growth of the Internet<br/>outside the English speaking world, better machine translation is of<br/>critical importance for the broader community. This work directly<br/>affects the ability of English speakers to understand websites written<br/>in Chinese and Japanese, two of the most widely used languages on the<br/>Internet. The technique is generalizable to other language pairs and<br/>can ultimately have even wider impact.

  • Program Officer
    Tatiana D. Korelsky
  • Min Amd Letter Date
    7/14/2006 - 18 years ago
  • Max Amd Letter Date
    4/25/2008 - 16 years ago
  • ARRA Amount

Institutions

  • Name
    Monmouth University
  • City
    West Long Branch
  • State
    NJ
  • Country
    United States
  • Address
    400 Cedar Avenue
  • Postal Code
    077641898
  • Phone Number
    7325714491

Investigators

  • First Name
    Michiko
  • Last Name
    Kosaka
  • Email Address
    kosaka@monmouth.edu
  • Start Date
    7/14/2006 12:00:00 AM

FOA Information

  • Name
    Information Systems
  • Code
    104000