METHOD AND APPARATUS FOR TRAINING BILINGUAL WORD ALIGNMENT MODEL, METHOD AND APPARATUS FOR BILINGUAL WORD ALIGNMENT

Information

  • Patent Application
  • 20070203690
  • Publication Number
    20070203690
  • Date Filed
    February 23, 2007
    19 years ago
  • Date Published
    August 30, 2007
    18 years ago
Abstract
The present invention provides method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model. The method for training bilingual word alignment model, comprising: training a bilingual word alignment model for a first language and a second language, using a bilingual corpus of the first and second languages; training a bilingual word alignment model for the second language and a third language, using a bilingual corpus of the second and third languages; and estimating a bilingual word alignment model for the first language and the third language, based on said bilingual word alignment model for the first and second languages and said bilingual word alignment model for the second and third languages.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

It is believed that above-mentioned features, advantages and objectives of the present invention will be better understood through following description of the embodiments of the invention, taken in conjunction with the drawings in which,



FIG. 1 is a flowchart showing a method for training a bilingual word alignment model according to an embodiment of the present invention;



FIG. 2 is a flowchart showing a method for bilingual word alignment according to an embodiment of the present invention;



FIG. 3 is a block diagram showing an apparatus for training a bilingual word alignment model according to an embodiment of the present invention; and



FIG. 4 is a block diagram showing an apparatus for bilingual word alignment according to an embodiment of the present invention.


Claims
  • 1. A method for training bilingual word alignment model, comprising: training a bilingual word alignment model for a first language and a second language, using a bilingual corpus of the first and second languages;training a bilingual word alignment model for the second language and a third language, using a bilingual corpus of the second and third languages; andestimating a bilingual word alignment model for the first language and the third language, based on said bilingual word alignment model for the first and second languages and said bilingual word alignment model for the second and third languages.
  • 2. The method for training bilingual word alignment model according to claim 1, wherein said bilingual word alignment model for the first and second languages and said bilingual word alignment model for the second and third languages respectively comprises: a word translation sub-model, a position distortion sub-model and a word fertility sub-model; said step of estimating a bilingual word alignment model for the first language and the third language comprises:estimating a word translation sub-model for the first and third languages, based on the word translation sub-model for the first and second languages and the word translation sub-model for the second and third languages;estimating a position distortion sub-model for the first and third languages, based on the position distortion sub-model for the first and second languages and the position distortion sub-model for the second and third languages; andestimating a word fertility sub-model for the first and third languages, based on the word fertility sub-model for the first and second languages and/or the word fertility sub-model for the second and third languages, the word translation sub-model for the first and second languages and/or the word translation sub-model for the second and third languages.
  • 3. The method for training bilingual word alignment model according to claim 2, wherein said step of estimating a word translation sub-model for the first and third languages comprises: where PCE(wc|we) represents the translation probability from the second language word we to the first language word wc, PEJ(we|wj) represents the translation probability form the third language word wj to the second language word we, C(wj,wc) represents the co-occurrence count of the first language word wc and the third language word wj, p(wc|wj) represents the translation probability from the third language word wj to the first language word wc,collecting the co-occurrence count of the first language word wc and the third language word wj, using formula
  • 4. The method for training bilingual word alignment model according to claim 2, wherein said step of estimating a position distortion sub-model for the first and third languages comprises: where pEJ(k|i,l,m′) represents the probability that the ith position in the third language sentence having a length of l is corresponding to the kth position in the second language sentence having a length of m′, pCE(j|k,m′,m) represents the probability that the kth position in the second language sentence having a length of m′ is corresponding to the jth position in the first language sentence having a length of m, C(j,i,l,m) and PCJ(j|i,l,m) respectively represent the co-occurrence count and probability that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m,collecting the co-occurrence count that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m, using formula C(j,i,l,m)=Σk,m′pEJ(k|i,l,m′)*pCE(j|k,m′,m); andcalculating the position distortion probability that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m, using formula
  • 5. The method for training bilingual word alignment model according to claim 2, wherein said step of estimating a word fertility sub-model for the first and third languages comprises: where PEJ(we|wj) represents the translation probability form the third language word wj to the second language word we, pCE(φi|we) represents the probability that the second language word we is corresponding to  i words of the first language, C(φi,wj) and p(φi|wj) respectively represent the co-occurrence count and probability that the third language word wj is corresponding to φi words of the first language,collecting the co-occurrence count that the third language word wj is corresponding to φi words of the first language, using formula
  • 6. A method for bilingual word alignment, comprising: obtaining a bilingual word alignment model for a first language and a third language based on the bilingual corpus of the first and second languages and the bilingual corpus of the second and third languages, by using the method for training bilingual word alignment model according to any one of claims 1˜5;word-aligning a bilingual sentence pair of the first and third languages using said bilingual word alignment model of the first and third languages.
  • 7. An apparatus for training bilingual word alignment model, comprising: a first training unit configured to train a bilingual word alignment model for a first language and a second language, using a bilingual corpus of the first and second languages;a second training unit configured to train a bilingual word alignment model for the second language and a third language, using a bilingual corpus of the second and third languages; anda model estimating unit configured to estimate a bilingual word alignment model for the first language and the third language, based on said bilingual word alignment model for the first and second languages and said bilingual word alignment model for the second and third languages.
  • 8. The apparatus for training bilingual word alignment model according to claim 7, wherein said bilingual word alignment model for the first and second languages and said bilingual word alignment model for the second and third languages respectively comprises: a word translation sub-model, a position distortion sub-model and a word fertility sub-model; said model estimating unit comprises:a word translation sub-model estimating unit configured to estimate a word translation sub-model for the first and third languages, based on the word translation sub-model for the first and second languages and the word translation sub-model for the second and third languages;a position distortion sub-model estimating unit configured to estimate a position distortion sub-model for the first and third languages, based on the position distortion sub-model for the first and second languages and the position distortion sub-model for the second and third languages; anda word fertility sub-model estimating unit configured to estimate a word fertility sub-model for the first and third languages, based on the word fertility sub-model for the first and second languages and/or the word fertility sub-model for the second and third languages, the word translation sub-model for the first and second languages and/or the word translation sub-model for the second and third languages.
  • 9. The apparatus for training bilingual word alignment model according to claim 8, wherein pCE(wc|we) represents the translation probability from the second language word we to the first language word wc, PEJ(we|wj) represents the translation probability form the third language word wj to the second language word we, C(wj,wc) represents the co-occurrence count of the first language word wc and the third language word wj, p(wc|wj) represents the translation probability from the third language word wj to the first language word wc,said word translation sub-model estimating unit collects the co-occurrence count of the first language word wc and the third language word wj, using formula
  • 10. The apparatus for training bilingual word alignment model according to claim 8, wherein pEJ(k|i,l,m′) represents the probability that the ith position in the third language sentence having a length of l is corresponding to the kth position in the second language sentence having a length of m′, pCE(j|k,m′,m) represents the probability that the kth position in the second language sentence having a length of m′ is corresponding to the jth position in the first language sentence having a length of m, C(j,i,l,m) and pCJ(j|i,l,m) respectively represent the co-occurrence count and probability that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m,said position distortion sub-model estimating unit collects the co-occurrence count that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m, using formula C(j,i,l,m)=Σk,m′pEJ(k|i,l,m′)*PCE(j|k,m′,m); andcalculates the position distortion probability that the ith position in the third language sentence having a length of l is corresponding to the jth position in the first language sentence having a length of m, using formula
  • 11. The apparatus for training bilingual word alignment model according to claim 8, wherein PEJ(we|wj) represents the translation probability form the third language word wj to the second language word we, PCE(φi|we) represents the probability that the second language word we is corresponding to φi words of the first language, C(φi,wj) and p(φi|wj) respectively represent the co-occurrence count and probability that the third language word wj is corresponding to φi words of the first language,said word fertility sub-model estimating unit collects the co-occurrence count that the third language word wj is corresponding to φi words of the first language, using formula
  • 12. An apparatus for bilingual word alignment comprising; model obtaining unit configured to obtain a bilingual word alignment model for a first language and a third language based on a the bilingual corpus of the first and second languages and the bilingual corpus of the second and third languages by the apparatus for training bilingual word alignment model according to any one of claims 7˜11 and;word-alignment unit configured to word-align a bilingual sentence pair of the first and third languages using the bilingual word alignment model for the first and third languages.
Priority Claims (1)
Number Date Country Kind
200610058067.6 Feb 2006 CN national