This application is based upon and claims the benefit of priority from prior Chinese Patent Application No. 200710089195.1, filed on Mar. 21, 2007; the entire contents of which are incorporated herein by reference.
The present invention relates to technology of information processing, more particularly to technology of translation generation and technology of machine translation based on bilingual alignment technology.
Example-Based Machine Translation (EBMT) system is an automatic translation system, and the translation system directly uses aligned bilingual example sentences as translation knowledge. For an inputted sentence to be translated, the translation system first retrieves a matched bilingual example sentence in an aligned bilingual example corpus by using a matching technology, and then extracts a translation fragment corresponding to a matched fragment from the bilingual example sentence by using alignment information of the bilingual example sentence. Finally, the translation system combines these translation fragments into a translation of the inputted sentence.
In the current EBMT systems, there are two main approaches for the translation generation:
(1) Semantic Approach
This approach obtains an appropriate target language fragment for each part of the input sentence by the use of thesaurus. Then the translation is generated by the recombination of the target language fragments in a pre-defined order.
(2) Statistical Approach
This approach generates the translation by recombining target language fragments with a statistical language model.
The first approach does not take into account the transition between target language fragments. Therefore, the fluency of this kind of translation is poor.
The second approach can solve the fluency problem by using the n-gram co-occurrence statistics. However, this method does not take into account the semantic relations between the example and the input sentence. As a result, the accuracy of this kind of translation is weak.
Therefore, there is a need to provide a method for generating a translation and machine translation considering the above-mentioned factors simultaneously.
In order to solve the above-mentioned problems in the prior technology, the present invention provides a method and an apparatus for generating a translation and machine translation.
According to an aspect of the present invention, there is provided a method for generating a translation, wherein a sentence of a first language to be translated is split into a plurality of fragments, an aligned bilingual example corpus comprises a plurality of example sentence pairs of the first language and a second language and alignment information between each sentence pair, and comprises at least one translation fragment of the second language corresponding to each of the above-mentioned plurality of fragments of the first language; the method comprising: selecting an optimum translation fragment combination of the second language from a plurality of possible translation fragment combinations of the second language corresponding to the sentence of the first language based on an integrated score obtained from a plurality of feature functions on a translation fragment combination; and generating the translation of the second language based on the above-mentioned optimum translation fragment combination.
According to another aspect of the present invention, there is provided a method for generating a translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair, a sentence of the first language to be translated is matched with respect to the above-mentioned aligned bilingual example corpus, and at least one translation fragment of the second language corresponding to each possible fragment of the above-mentioned sentence of the first language is obtained; the method comprising: selecting an optimum translation fragment combination of the second language by using a search algorithm, wherein an integrated score is obtained from a plurality of feature functions on a possible translation fragment or a combination of translation fragments as a cost of the above-mentioned search algorithm; and generating the translation of the second language based on the above-mentioned optimum translation fragment combination.
According to another aspect of the present invention, there is provided a method for machine translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair; the method comprising: splitting a sentence of the first language to be translated into a plurality of fragments; and generating the translation of the second language by means of the above-mentioned method for generating a translation.
According to another aspect of the present invention, there is provided a method for machine translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair; the method comprising: matching a sentence of the first language to be translated with respect to the above-mentioned aligned bilingual example corpus to obtain at least one translation fragment of the second language corresponding to each possible fragment of the above-mentioned sentence of the first language; and generating the translation of the second language by means of the above-mentioned method for generating a translation.
According to another aspect of the present invention, there is provided an apparatus for generating a translation, wherein a sentence of a first language to be translated is split into a plurality of fragments, an aligned bilingual example corpus comprises a plurality of example sentence pairs of the first language and a second language and alignment information between each sentence pair, and comprises at least one translation fragment of the second language corresponding to each of the above-mentioned plurality of fragments of the first language; the apparatus comprising: a selecting unit configured to select an optimum translation fragment combination of the second language from a plurality of possible translation fragment combinations of the second language corresponding to the above-mentioned sentence of the first language based on an integrated score obtained from a plurality of feature functions on a translation fragment combination; and a translation generating unit configured to generate the translation of the second language based on the above-mentioned optimum translation fragment combination.
According to another aspect of the present invention, there is provided an apparatus for generating a translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair, a sentence of the first language to be translated is matched with respect to the above-mentioned aligned bilingual example corpus, and at least one translation fragment of the second language corresponding to each possible fragment of the above-mentioned sentence of the first language is obtained; the apparatus comprising: a selecting unit configured to select an optimum translation fragment combination of the second language by using a search algorithm, wherein an integrated score is obtained from a plurality of feature functions on a possible translation fragment or a combination of translation fragments as a cost of the above-mentioned search algorithm; and a translation generating unit configured to generate the translation of the second language based on the above-mentioned optimum translation fragment combination.
According to another aspect of the present invention, there is provided an apparatus for machine translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair; the apparatus comprising: a splitting unit configured to split a sentence of the first language to be translated into a plurality of fragments; and the above-mentioned apparatus for generating a translation configured to generate the translation of the second language.
According to another aspect of the present invention, there is provided an apparatus for machine translation, wherein an aligned bilingual example corpus comprises a plurality of example sentence pairs of a first language and a second language and alignment information between each sentence pair; the apparatus comprising: a matching unit configured to match a sentence of the first language to be translated with respect to the above-mentioned aligned bilingual example corpus to obtain at least one translation fragment of the second language corresponding to each possible fragment of the above-mentioned sentence of the first language; and the above-mentioned apparatus for generating a translation configured to generate the translation of the second language.
Next, a detailed description of each embodiment of the present invention will be given in conjunction with the accompany drawings.
Method for Generating a Translation
Specifically, in this embodiment, the sentence of the first language to be translated is split into a plurality of fragments by hand or automatically, and one or a plurality of translation fragments of the second language corresponding to each of the plurality of fragments of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for splitting a sentence of the first language to be translated, and any method as known in the art can be used, if only a sentence to be translated can be split into effective fragments, translation fragments of which can be found in an aligned bilingual example corpus.
Next, a detailed description of the plurality of feature functions and a calculating process of the integrated score obtained from a plurality of feature functions on a translation fragment combination will be given.
In this embodiment, the above-mentioned feature functions indicate a plurality of kinds of translation knowledge contained in a translation generating model of a machine translation system based on bilingual example sentences (in the model, translation knowledge is called a feature function), for example, a feature function of calculating similarity between a bilingual example sentence and an inputted sentence, reliability of a bilingual example sentence and fluency of a generated translation.
The feature functions of the embodiment comprise but not limit to the following kinds:
A a translation probability of a word from a source language to a target language
B a translation probability of a word from a target language to a source language
C a translation probability of a phrase from a source language to a target language
D a translation probability of a phrase from a target language to a source language
E a selection probability of a target language based on length
h
TLS(e,f,E)=hTLS(e,f)=log p(I|J)
With respect to a sentence to be translated, this function will give a smaller value for a shorter or a longer translation.
F a target language model
The bigger the value of this feature function is, the better the fluency of the translation generated is.
G a semantic similarity
The bigger the value of this feature function is, the closer the meaning between corresponding fragments in a bilingual example sentence and an inputted sentence is.
In the above-mentioned plurality of feature functions:
h denotes a feature;
f denotes a sentence to be translated;
e denotes a translation generated;
ei denotes a word of a translation;
fi denotes a word of an inputted sentence;
e′i denotes a phrase of a translation;
fi denotes a phrase of an inputted sentence;
ai denotes a unit number aligning with the ith unit;
I denotes length of e;
J denotes length of f; and
M(z,f) denotes semantic similarity between corresponding fragments in a bilingual example sentence and an inputted sentence.
Specifically, the feature functions A, B and E are seen in a doctor's dissertation published in 2003 “Noun Phrase Translation, University of Southern California”, Philipp Koehn, which is incorporated herein by reference (hereinafter reference 1).
The feature functions C and D are seen in an article published in 2002 “Discriminative training and maximum entropy models for statistical machine translation”, Franz Josef Och and Hermann Ney, in Proceedings of the 40th Annual Meeting of the ACL, pages 295-302, which is incorporated herein by reference (hereinafter reference 2).
The feature function F is seen in an article published in 2002 “SRILM—an extensible language modeling toolkit”, Andreas Stolcke, in Proceedings of the International Conference on Spoken Language Processing, volume 2, pages 901-904, which is incorporated herein by reference (hereinafter reference 3).
The feature function G is seen in a published article “Example-based machine translation based on TSC and statistical generation”, Liu Zhanyi, Wang Haifeng and Wu Hua, MT Summit X, Phuket, Thailand, Sep. 13-15, 2005, which is incorporated herein by reference (hereinafter reference 4).
In this embodiment, the above-mentioned feature functions A-G are shown, however, it should be understood that, the present invention has no special limitation to this, and any feature function contributing to generating a translation can be comprised.
Next, a detailed description of a calculating process of an integrated score obtained from the above-mentioned plurality of feature functions on a translation fragment combination will be given in conjunction with
wherein hm denotes the mth feature function, λm denotes the weight of the mth feature function, f denotes the sentence of the first language to be translated, e denotes the translation fragment combination of the second language, E denotes a collection of translation fragments required to generate e, and s(e) denotes the integrated score obtained from the plurality of feature functions on e.
In this embodiment, the weight of each feature function is taken into account preferably, wherein a training method of a weight of a feature function is seen in an article published in 2003 “Minimum error rate training in statistical machine translation”, Franz Josef Och., in proceedings of the 41st Annual Meeting of the ACL, pages 160-167, which is incorporated herein by reference (hereinafter reference 5). However, it should be understood that, the above-mentioned integrated score can be calculated directly by integrating scores obtained from each feature function on the translation fragment combination with a log-linear model without taking into account the weight of each feature function.
At Step 101, the integrated score of each of all translation fragment combinations can be calculated with the above-mentioned plurality of feature functions by using the above-mentioned method shown in
Optionally, in this embodiment, an optimum translation fragment combination of the second language also can be selected from a plurality of translation fragment combinations of the second language corresponding to the sentence of the first language by using a search algorithm. In this embodiment, the search algorithm comprises any algorithm as known in the art, for example, Beam search algorithm, A search algorithm and A* search algorithm etc, and the present invention has no special limitation to this. A detailed description of a detailed process of a search algorithm will be given in the embodiment of
Optionally, in this embodiment, the sentence of the first language to be translated can be split in a plurality of splitting schemes, for example, the sentence to be translated is split automatically by a splitting algorithm based on all sentence fragments found. For example:
A sentence to be translated=“w1 w2 w3 w4 w5 w6 w7 w8 w9”
The effective fragments comprise:
F1=w1 w2 w3
F2=w4 w5 w6
F3=w7 w8 w9
F4=w1 w2 w3 w4
F5=w5 w6 w7 w8 w9
The above fragments can compose two splitting schemes “f1 f2 f3” or “f4 f5”.
For the first splitting scheme “f1 f2 f3”, an optimum translation fragment combination of the second language is selected by using the above-mentioned method described at Step 101, wherein integrated scores of all translation fragment combinations of the splitting scheme “f1 f2 f3” are calculated with the above-mentioned plurality of feature functions by using the above-mentioned method shown in
For the second splitting scheme “f4 f5”, an optimum translation fragment combination of the second language is selected by using the above-mentioned method described at Step 101, wherein integrated scores of all translation fragment combinations of the splitting scheme “f4 f5” are calculated with the above-mentioned plurality of feature functions by using the above-mentioned method shown in
Then, the integrated scores of the optimum translation fragment combinations of the two splitting schemes are compared, the translation fragment combination with a high score is kept, and the translation fragment combination with a low score is eliminated, thereby, the optimum translation fragment combination of the second language is obtained for the sentence of the first language to be translated.
Further, the optimum translation fragment combination of the second language also can be selected from a plurality of translation fragment combinations of the second language corresponding to the sentence of the first language by using a search algorithm with respect to the first splitting scheme “f1 f2 f3” and the second splitting scheme “f4 f5”.
It should be understood that, although two splitting schemes are shown herein, the present invention does not limit to this, and it also can have more than two splitting schemes, wherein each splitting scheme merely needs to be calculated, and a plurality of splitting schemes are compared, and the optimum translation fragment combination of the second language is obtained finally.
At last, at Step 105, the translation of the second language is generated based on the above-mentioned optimum translation fragment combination.
By using the method for generating a translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of generating a translation is provided effectively relative to the method for generating a translation based on regulations. At the same time, this method can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the method for generating a translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the method for generating a translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Method for Generating a Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, one or a plurality of translation fragments of the second language corresponding to each possible fragment of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for matching a sentence of the first language to be translated, and any method as known in the art can be used, if only a corresponding translation fragment can be found for each possible fragment of the sentence to be translated in an aligned bilingual example corpus.
In this embodiment, the search algorithm comprises any algorithm as known in the art, for example, Beam search algorithm, A search algorithm and A* search algorithm etc, and the present invention has no special limitation to this. A detailed description of a detailed process of a search algorithm will be given in conjunction with
In the embodiment of
A sentence fragment: There is a red jacket on the bed
A translation fragment:
In
S: a sign, if a word is translated, the word is signed with “*”, otherwise, if a word is not translated, the word is signed with “-”;
T: a translation of the word with “*”;
Score: an integrated score of the translation obtained.
Specifically, Beam search algorithm is performed as follows:
First, a list (words=0 . . . 9) is initialized;
Next, for s=0 to 9:
Extending each status in S[s]
A new status is stored in a corresponding list based on a status sign. If the amount of words translated in the status is x, the status will be stored in the list of words=x.
If there is a status same with the new status in the list, the two statuses are compared, and the status with a high score is kept.
Pruning the List
If the amount of the statuses in one list is bigger than a predetermined threshold, the statuses with small scores are pruned.
Finally, a translation fragment combination with a highest score is searched in the list S[9] as an optimum translation fragment combination of the second language selected for a sentence of the first language to be translated.
In the above-mentioned search algorithm, the integrated score obtained from a plurality of feature functions on each translation fragment or each fragment combination is calculated based on the method of the above-mentioned embodiment of
At last, at Step 405, the translation of the second language is generated based on the above-mentioned optimum translation fragment combination.
By using the method for generating a translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of generating a translation is provided effectively relative to the method for generating a translation based on regulations. At the same time, this method can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the method for generating a translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the method for generating a translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Further, the method for generating a translation of the embodiment does not need to split a sentence of the first language to be translated in advance, and it merely needs to generate a translation with a high quality by using a search algorithm.
Method for Machine Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, the sentence of the first language to be translated is split into a plurality of fragments by hand or automatically, and one or a plurality of translation fragments of the second language corresponding to each of the plurality of fragments of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for splitting a sentence of the first language to be translated, and any method as known in the art can be used, if only a sentence to be translated can be split into effective fragments, translation fragments of which can be found in an aligned bilingual example corpus.
Next, at Step 505, the translation of the second language is generated by means of the above-mentioned method for generating a translation of the embodiment of
By using the method for machine translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of machine translation is provided effectively relative to the method for machine translation based on regulations. At the same time, this method can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the method for machine translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the method for machine translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Method for Machine Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, one or a plurality of translation fragments of the second language corresponding to each possible fragment of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for matching a sentence of the first language to be translated, and any method as known in the art can be used, if only a corresponding translation fragment can be found for each possible fragment of the sentence to be translated in an aligned bilingual example corpus.
Next, at Step 605, the translation of the second language is generated by means of the above-mentioned method for generating a translation of the embodiment of
By using the method for machine translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of machine translation is provided effectively relative to the method for machine translation based on regulations. At the same time, this method can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the method for machine translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the method for machine translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Further, the method for machine translation of the embodiment does not need to split a sentence of the first language to be translated in advance, and it merely needs to generate a translation with a high quality by using a search algorithm.
Apparatus for Generating a Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, the sentence of the first language to be translated is split into a plurality of fragments by hand or automatically, and one or a plurality of translation fragments of the second language corresponding to each of the plurality of fragments of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for splitting a sentence of the first language to be translated, and any method as known in the art can be used, if only a sentence to be translated can be split into effective fragments, translation fragments of which can be found in an aligned bilingual example corpus.
Next, a detailed description of the above-mentioned plurality of feature functions and a calculating process of an integrated score obtained from a plurality of feature functions on a translation fragment combination calculated by the calculating unit 701 will be given.
In this embodiment, the above-mentioned feature functions indicate a plurality of kinds of translation knowledge contained in a translation generating model of a machine translation system based on bilingual example sentences (in the model, translation knowledge is called a feature function), for example, a feature function of calculating similarity between a bilingual example sentence and an inputted sentence, reliability of a bilingual example sentence and fluency of a generated translation.
The feature functions of the embodiment comprise but not limit to the following kinds:
A a translation probability of a word from a source language to a target language
B a translation probability of a word from a target language to a source language
C a translation probability of a phrase from a source language to a target language
D a translation probability of a phrase from a target language to a source language
E a selection probability of a target language based on length
h
TLS(e,f,E)=hTLS(e,f)=log p(I|J)
With respect to a sentence to be translated, this function will give a smaller value for a shorter or a longer translation.
F a target language model
The bigger the value of this feature function is, the better the fluency of the translation generated is.
G a semantic similarity
The bigger the value of this feature function is, the closer the meaning between corresponding fragments in a bilingual example sentence and an inputted sentence is.
In the above-mentioned plurality of feature functions:
h denotes a feature;
f denotes a sentence to be translated;
e denotes a translation generated;
ei denotes a word of a translation;
fi denotes a word of an inputted sentence;
e′i denotes a phrase of a translation;
fi denotes a phrase of an inputted sentence;
ai denotes a unit number aligning with the ith unit;
I denotes length of e;
J denotes length of f; and
M(z,f) denotes a semantic similarity between corresponding fragments in a bilingual example sentence and an inputted sentence.
Specifically, the feature functions A, B and E are seen in the above-mentioned reference 1.
The feature functions C and D are seen in the above-mentioned reference 2.
The feature function F is seen in the above-mentioned reference 3.
The feature function G is seen in the above-mentioned reference 4.
In this embodiment, the above-mentioned feature functions A-G are shown, however, it should be understood that, the present invention has no special limitation to this, and any feature function contributing to generating a translation can be comprised.
Next, a detailed description of a calculating process of an integrated score obtained from the above-mentioned plurality of feature functions on a translation fragment combination will be given in conjunction with
wherein hm denotes the mth feature function, λm denotes the weight of the mth feature function, f denotes the sentence of the first language to be translated, e denotes the translation fragment combination of the second language, E denotes a collection of translation fragments required to generate e, and s(e) denotes the integrated score obtained from the plurality of feature functions on e.
In this embodiment, the weight of each feature function is taken into account preferably when the integrated score obtained from a plurality of feature functions on a translation fragment combination is calculated by the calculating unit 701, wherein a training method of a weight of a feature function is seen in the above-mentioned reference 5. However, it should be understood that, the above-mentioned integrated score can be calculated directly by integrating scores obtained from each feature function on the translation fragment combination with a log-linear model without taking into account the weight of each feature function.
In this embodiment, a translation fragment combination with a highest score is selected by the selecting unit 705 as an optimum translation fragment combination of the second language with the integrated score obtained from the above-mentioned plurality of feature functions on each of all translation fragment combinations calculated by the calculating unit 701 by using the above-mentioned method shown in
Optionally, in this embodiment, an optimum translation fragment combination of the second language also can be selected by the selecting unit 705 from a plurality of translation fragment combinations of the second language corresponding to the sentence of the first language by using a searching unit. In this embodiment, the searching unit comprises any unit as known in the art, for example, the searching unit of Beam search algorithm, A search algorithm and A* search algorithm etc, and the present invention has no special limitation to this. A detailed description of a detailed process of a search algorithm will be given in the embodiment of
Optionally, in this embodiment, the sentence of the first language to be translated can be split in a plurality of splitting schemes, for example, the sentence to be translated is split automatically by a splitting algorithm based on all sentence fragments found. For example:
A sentence to be translated=“w1 w2 w3 w4 w5 w6 w7 w8 w9”
The effective fragments comprise:
F1=w w2 w3
F2=w4 w5 w6
F3=w7 w8 w9
F4=w1 w2 w3 w4
F5=w5 w6 w7 w8 w9
The above fragments can compose two splitting schemes “f1 f2 f3” or “f4 f5”.
For the first splitting scheme “f1 f2 f3”, an optimum translation fragment combination of the second language is selected by using the selecting unit 705, wherein integrated scores obtained from the above-mentioned plurality of feature functions on all translation fragment combinations of the splitting scheme “f1 f2 f3” are calculated by the calculating unit 701 by using the above-mentioned method shown in
For the second splitting scheme “f4 f5”, an optimum translation fragment combination of the second language is selected by using the selecting unit 705, wherein integrated scores obtained from the above-mentioned plurality of feature functions on all translation fragment combinations of the splitting scheme “f4 f5” are calculated by the calculating unit 701 by using the above-mentioned method shown in
Then, the integrated scores of the optimum translation fragment combination of the two splitting schemes are compared, the translation fragment combination with a high score is kept, and the translation fragment combination with a low score is eliminated, thereby, the optimum translation fragment combination of the second language is obtained for the sentence of the first language to be translated.
Further, the optimum translation fragment combination of the second language also can be selected by the selecting unit 705 from a plurality of translation fragment combinations of the second language corresponding to the sentence of the first language by using a searching unit with respect to the first splitting scheme “f1 f2 f3” and the second splitting scheme “f4 f5”.
It should be understood that, although two splitting schemes are shown herein, the present invention does not limit to this, and it also can have more than two splitting schemes, wherein each splitting scheme merely needs to be calculated, and a plurality of splitting schemes are compared, and the optimum translation fragment combination of the second language is obtained finally.
The apparatus 700 for generating a translation in this embodiment and its each composing part can be composed of a special circuit or CMOS chip, and also can be realized by the computer (processor) executing the relevant program.
By using the apparatus 700 for generating a translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of generating a translation is provided effectively relative to the apparatus for generating a translation based on regulations. At the same time, this apparatus can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the apparatus 700 for generating a translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the apparatus 700 for generating a translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Apparatus for Generating a Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, one or a plurality of translation fragments of the second language corresponding to each possible fragment of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for matching a sentence of the first language to be translated, and any method as known in the art can be used, if only a corresponding translation fragment can be found for each possible fragment of the sentence to be translated in an aligned bilingual example corpus.
In this embodiment, the searching unit comprises any unit as known in the art, for example, a searching unit performing Beam search algorithm, A search algorithm and A* search algorithm etc, and the present invention has no special limitation to this. A detailed description of a detailed process of a search algorithm will be given in conjunction with
In the embodiment of
A sentence fragment: There is a red jacket on the bed
A translation fragment:
In
S: a sign, if a word is translated, the word is signed with “*”, otherwise, if a word is not translated, the word is signed with “-”;
T: a translation of the word with “*”;
Score: an integrated score of the translation obtained.
Specifically, Beam search algorithm is performed as follows:
First, a list (words=0 . . . 9) is initialized;
Next, for s=0 to 9:
Extending each status in S[s]
A new status is stored in a corresponding list based on a status sign. If the amount of words translated in the status is x, the status will be stored in the list of words=x.
If there is a status same with the new status in the list, the two statuses are compared, and the status with a high score is kept.
Pruning the List
If the amount of the statuses in one list is bigger than a predetermined threshold, the statuses with small scores are pruned.
Finally, a translation fragment combination with a highest score is searched in the list S[9] as an optimum translation fragment combination of the second language selected for a sentence of the first language to be translated.
In the above-mentioned search algorithm, the integrated score obtained from a plurality of feature functions on each translation fragment or each fragment combination is calculated by the calculating unit 801 based on the method of the above-mentioned embodiment of
The apparatus 800 for generating a translation in this embodiment and its each composing part can be composed of a special circuit or CMOS chip, and also can be realized by the computer (processor) executing the relevant program.
By using the apparatus 800 for generating a translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of generating a translation is provided effectively relative to the apparatus for generating a translation based on regulations. At the same time, this apparatus can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the apparatus 800 for generating a translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the apparatus 800 for generating a translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Further, the apparatus 800 for generating a translation of the embodiment does not need to split a sentence of the first language to be translated in advance, and it merely needs to generate a translation with a high quality by using a search algorithm.
Apparatus for Machine Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, the sentence of the first language to be translated is split into a plurality of fragments by hand or automatically, and one or a plurality of translation fragments of the second language corresponding to each of the plurality of fragments of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for splitting a sentence of the first language to be translated, and any method as known in the art can be used, if only a sentence to be translated can be split into effective fragments, translation fragments of which can be found in an aligned bilingual example corpus.
The apparatus 700 for generating a translation of the embodiment is an apparatus for generating a translation of the above-mentioned embodiment of
The apparatus 900 for machine translation in this embodiment and its each composing part can be composed of a special circuit or CMOS chip, and also can be realized by the computer (processor) executing the relevant program.
By using the apparatus 900 for machine translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of machine translation is provided effectively relative to the apparatus for machine translation based on regulations. At the same time, this apparatus can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the apparatus 900 for machine translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the apparatus 900 for machine translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Apparatus for Machine Translation
Under the same inventive conception,
As shown in
Specifically, in this embodiment, one or a plurality of translation fragments of the second language corresponding to each possible fragment of the first language to be translated are searched in an aligned bilingual example corpus by matching. The aligned bilingual example corpus is a bilingual example corpus word-aligned by a professional (for example, a translator) by hand or by a computer automatically, which comprises a plurality of example sentence pairs of the first language and the second language and alignment information between each sentence pair. It should be understood that, the present invention has no special limitation to the method for matching a sentence of the first language to be translated, and any method as known in the art can be used, if only a corresponding translation fragment can be found for each possible fragment of the sentence to be translated in an aligned bilingual example corpus.
The apparatus 800 for generating a translation of the embodiment is an apparatus for generating a translation of the above-mentioned embodiment of
The apparatus 1000 for machine translation in this embodiment and its each composing part can be composed of a special circuit or CMOS chip, and also can be realized by the computer (processor) executing the relevant program.
By using the apparatus 1000 for machine translation of the embodiment, aligned bilingual example sentences are used as translation knowledge (feature functions namely), and the efficiency of machine translation is provided effectively relative to the apparatus for machine translation based on regulations. At the same time, this apparatus can generate a translation with a better quality in a special application.
Further, a translation generated is evaluated with a plurality of kinds of translation knowledge from different points of view by using the apparatus 1000 for machine translation of the embodiment, thus a translation with a high quality is obtained. For example, since translation knowledge used comprises semantic resources and a target language model, the fluency of a translation generated is favorable as well as the semantic similarity thereof with the inputted sentence is very high.
Further, the apparatus 1000 for machine translation of the embodiment can be extended by adding new translation knowledge, thereby the quality of the translation can be further improved.
Further, the apparatus 1000 for machine translation of the embodiment does not need to split a sentence of the first language to be translated in advance, and it merely needs to generate a translation with a high quality by using a search algorithm.
Though a method for generating a translation, a method for machine translation, an apparatus for generating a translation, and an apparatus for machine translation have been described in details with some exemplary embodiments, these above embodiments are not exhaustive. Those skilled in the art can make various variations and modifications within the spirit and the scope of the present invention. Therefore, the present invention is not limited to these embodiments; rather, the scope of the present invention is only defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
200710089195.1 | Mar 2007 | CN | national |