Claims
- 1. A method comprising:
training a translation model with a plurality of translation pairs, each translation pair including a text segment in a source language and a corresponding text segment in a target language; generating a plurality of tuples from each of a plurality of said translation pairs, each tuple comprising a phrase in the source language, a phrase in the target language, and probability information relating to said phrases; and storing the tuples in a statistical translation memory.
- 2. The method of claim 1, wherein the probability information relating to said phrases comprises alignment information.
- 3. The method of claim 1, wherein said generating comprises pairing a plurality of phrases in the target language with a phrase in the source language.
- 4. The method of claim 3, further comprising:
selecting one translation equivalent from said plurality of phrases in the target language; and associating said translation equivalent with the phrase in the source language.
- 5. The method of claim 4, wherein said selecting comprises selecting a phrase occurring most frequently in the extracted phrases.
- 6. The method of claim 4, wherein said selecting comprises selecting a phrase having a highest probability of being a correct translation of said phrase in the source language.
- 7. The method of claim 1, further comprising:
judging a correctness of a plurality of said tuples; and selecting a tuple in a translation operation in response to said judgment.
- 8. A statistical translation memory comprising:
a plurality of tuples extracted from a plurality of translation pairs in a corpus, each tuple including a text segment in a source language and a corresponding text segment in a target language.
- 9. The statistical translation memory of claim 8, wherein each tuple further comprises alignment information relating to the text segments in said each tuple.
- 10. The statistical translation memory of claim 8, wherein the text segment in the target language in each of a plurality of tuples is selected from a plurality of text segments in the target language extracted from the corpus.
- 11. The statistical translation memory of claim 10, wherein said selected text segment has been selected based on a calculated probability of correctness.
- 12. The statistical translation memory of claim 10, wherein said selected text segment has been selected based on a frequency of occurrence in the extracted text segments.
- 13. Apparatus comprising:
a translation model operative to assign a probability to each of a plurality of translation pairs, each translation pair including a text segment in a source language and a corresponding text segment in a target language; an extraction module operative to extract a plurality of tuples from each of a plurality of said translation pairs, each tuple comprising a phrase in the source language, a phrase in the target language, and probability information relating to said phrases; and a statistical translation memory operative to store the tuples.
- 14. The apparatus of claim 13, wherein the probability information relating to said phrases comprises alignment information.
- 15. The apparatus of claim 13, wherein said the extraction module is further operative to pair a plurality of phrases in the target language with a phrase in the source language.
- 16. The apparatus of claim 15, wherein the extraction unit is further operative to select one translation equivalent from said plurality of phrases and associate said translation equivalent with the phrase in the source language.
- 17. The apparatus of claim 16, wherein the extraction unit is operative to select a phrase occurring most frequently in the extracted phrases from said plurality of phrases.
- 18. The apparatus of claim 16, wherein the extraction unit is operative to select a phrase having a highest probability of being a correct translation of said phrase in the source language.
- 19. The apparatus of claim 16, wherein the extraction unit is operative to select a phrase having an alignment of highest probability with said phrase in the source language.
- 20. An article comprising a machine-readable medium including instructions operative to cause a machine to:
train a translation model with a plurality of translation pairs, each translation pair including a text segment in a source language and a corresponding text segment in a target language; generate a plurality of tuples from each of a plurality of said translation pairs, each tuple comprising a phrase in the source language, a phrase in the target language, and probability information relating to said phrases; and store the tuples in a statistical translation memory.
- 21. The article of claim 19, wherein the probability information relating to said phrases comprises alignment information.
- 22. The article of claim 20, wherein said generating comprises pairing a plurality of phrases in the target language with a phrase in the source language.
- 23. The article of claim 22, further comprising:
selecting one translation equivalent from said plurality of phrases in the target language; and associating said translation equivalent with the phrase in the source language.
- 24. The article of claim 23, wherein said selecting comprises selecting a phrase occurring most frequently in the extracted phrases.
- 25. The article of claim 23, wherein said selecting comprises selecting a phrase having a highest probability of being a correct translation of said phrase in the source language.
- 26. The article of claim 23, wherein selecting comprises selecting a phrase having an alignment of highest probability with said phrase in the source language.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of, and incorporates herein, U.S. Provisional Patent Application No. 60/291,852, filed May 17, 2001.
ORIGIN OF INVENTION
[0002] The research and development described in this application were supported by DARPA-ITO under grant number N66001-00-1-9814. The U.S. Government may have certain rights in the claimed inventions.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60291852 |
May 2001 |
US |