Word alignment apparatus, example sentence bilingual dictionary, word alignment method, and program product for word alignment

Information

  • Patent Application
  • 20070174040
  • Publication Number
    20070174040
  • Date Filed
    July 26, 2006
    19 years ago
  • Date Published
    July 26, 2007
    18 years ago
Abstract
A word alignment apparatus includes a word extracting portion that extracts each word from an example sentence and from a translation sentence thereof, an alignment calculator that calculates at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value, and an optimization portion that optimizes the alignment by performing a bipartite graph matching.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be described in detail based on the following figures, wherein:



FIG. 1 is a functional block diagram illustrating a whole configuration of a word alignment apparatus according to an exemplary embodiment of the present invention;



FIG. 2 is a view showing a configuration of the word alignment apparatus;



FIG. 3A and FIG. 3B illustrate contents of a bilingual example sentence dictionary;



FIG. 4 is a flowchart describing an operation of a preprocessor;



FIG. 5 is a flowchart describing an operation of a corpus preprocessor;



FIG. 6A and FIG. 6B illustrate an example of a morphologic analysis on the preprocessor and on the corpus preprocessor;



FIG. 7 is a view illustrating an example of a corpus index table;



FIG. 8 is shows a dictionary record structure stored in a word bilingual dictionary;



FIG. 9 illustrates parameters of an association degree;



FIG. 10 shows other calculation examples of the association degree;



FIG. 11A through FIG. 11C illustrate similarity degree/association degree lists;



FIG. 12 is a view illustrating a data structure to store calculation results of the similarity degree and the association degree;



FIG. 13 is a view illustrating examples of the data structure;



FIG. 14A and FIG. 14B illustrate an maximum weight matching of bipartite graph two-part graph matching;



FIG. 15 is a table showing an example of word alignment of the example sentence and the translation sentence thereof that have been input according to a second exemplary embodiment of the present invention;



FIG. 16 is a table showing comparison results of recall rates and precision in the word alignment used in the second exemplary embodiment of the present invention and a conventional word alignment; and



FIG. 17 illustrates a recall rate and precision in the word alignment.


Claims
  • 1. A word alignment apparatus comprising: a word extracting portion that extracts each word from an example sentence and from a translation sentence thereof;an alignment calculator that calculates at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value; andan optimization portion that optimizes the alignment by performing a bipartite graph matching.
  • 2. The word alignment apparatus according to claim 1, wherein the alignment calculator calculates the similarity degree between the words with reference to a word bilingual dictionary.
  • 3. The word alignment apparatus according to claim 2, wherein the similarity degree includes at least one of a shape similarity degree, a semantic similarity degree between the first language and the second language, and a POS similarity degree.
  • 4. The word alignment apparatus according to claim 1, wherein the alignment calculator calculates an association degree between the words with reference to corpus statistic information.
  • 5. The word alignment apparatus according to claim 4, wherein an association degree Ass(c, j) is calculated in an expression (A), where “c” denotes the word in the first language, “j” denotes the word in the second language, “a” denotes a co-occurrence frequency between a word “c” and a word “j”, freq(c) denotes an occurrence frequency of the word “c”, and freq(j) denotes the occurrence frequency of the word “j”.
  • 6. The word alignment apparatus according to claim 1, wherein the optimization portion optimizes the alignment by performing a weighted two-part graph matching with a value of at least one of the similarity degree and the association degree that have been calculated by the alignment calculator.
  • 7. The word alignment apparatus according to claim 6, wherein the optimization portion optimizes the alignment by performing a maximum and minimum weight matching on bipartite graph.
  • 8. The word alignment apparatus according to claim 1, wherein if at least one of the similarity degree and the association degree between the words has a threshold value greater than a given threshold value, the optimization portion fixes the alignment between the words, and optimizes the alignment between remaining words.
  • 9. The word alignment apparatus according to claim 1, wherein the word extracting portion performs a morphologic analysis on the example sentence and the translation sentence, and extracts the word from the example sentence and the translation sentence.
  • 10. The word alignment apparatus according to claim 1, wherein the example sentence and the translation sentence are stored in an example sentence bilingual dictionary.
  • 11. The word alignment apparatus according to claim 1, further comprising a storage portion that stores the alignment between the words that are optimized.
  • 12. An example sentence bilingual dictionary comprising example sentences and translation sentences thereof that are aligned by a word alignment apparatus, the word alignment apparatus including:a word extracting portion that extracts each word from an example sentence and from a translation sentence;an alignment calculator that calculates at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value; andan optimization portion that optimizes the alignment by performing a bipartite graph matching.
  • 13. A word alignment method comprising: extracting each word from an example sentence and from a translation sentence thereof;calculating at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value; andoptimizing the alignment by performing a bipartite graph matching.
  • 14. The word alignment method according to claim 13, wherein calculating calculates the similarity degree between the words with reference to a word bilingual dictionary.
  • 15. The word alignment method according to claim 13, wherein the similarity degree includes at least one of a shape similarity degree, a semantic similarity degree between the first language and the second language, and a POS similarity degree.
  • 16. The word alignment method according to claim 13, wherein calculating calculates an association degree between the words with reference to corpus statistic information.
  • 17. The word alignment method according to claim 16, wherein an association degree Ass(c, j) is calculated in an expression (A), where “c” denotes the word in the first language, “j” denotes the word in the second language, “a” denotes a co-occurrence frequency between a word “c” and a word “j”, freq(c) denotes an occurrence frequency of the word “c”, and freq(j) denotes the occurrence frequency of the word “j”
  • 18. The word alignment method according to claim 13, wherein optimizing optimizes the alignment by performing a weighted bipartite graph matching with a value of at least one of the similarity degree and the association degree that have been calculated.
  • 19. The word alignment method according to claim 13, wherein optimizing optimizes the alignment by performing a maximum and minimum weight matching.
  • 20. The word alignment method according to claim 13, wherein if at least one of the similarity degree and the association degree between the words has a threshold value greater than a given threshold value, optimizing fixes the alignment between the words, and optimizes the alignment between remaining words.
  • 21. The word alignment method according to claim 13, wherein extracting extracts the example sentence and the translation sentence stored in an example sentence bilingual dictionary.
  • 22. The word alignment method according to claim 13, wherein extracting performs a morphologic analysis on the example sentence and the translation sentence, and extracts the word from the example sentence and the translation sentence.
  • 23. The word alignment method according to claim 13, further comprising storing the alignment between the words that are optimized.
  • 24. A computer readable medium storing a program causing a computer to execute a process for word alignment, the process comprising: extracting each word from an example sentence and from a translation sentence thereof;calculating at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value; andoptimizing the alignment by performing a two-part graph matching.
Priority Claims (1)
Number Date Country Kind
2006-014468 Jan 2006 JP national