Claims
- 1. A method comprising:
receiving an input text string in a source language; comparing a plurality of alternate translations for said input text string in a target language to text segments in a monolingual corpus in the target language; and recording a number of occurrences of each of at least a plurality of said plurality of alternate translations.
- 2. The method of claim 1, further comprising:
assigning a probability score to said at least a plurality of said plurality of alternative translations.
- 3. The method of claim 2, further comprising:
selecting at least a portion of said plurality of alternate translations based on the probability scores.
- 4. The method of claim 2, further comprising:
ranking the scored alternative translations based on the probability scores.
- 5. The method of claim 4, further comprising:
re-ranking the ranked alternate translations based on the recorded number of occurrences.
- 6. The method of claim 5, further comprising:
ranking and re-ranking alternate translations for a plurality of input text strings.
- 7. The method of claim 6, further comprising:
training a statistical machine translator with re-ranked alternate translations.
- 8. The method of claim 6, further comprising:
building parallel corpora in the source language and the target language using re-ranked alternate translations.
- 9. The method of claim 1, further comprising:
building a finite state acceptor for the input text string operative to encode a plurality of alternative translations for said input text string in the target language.
- 10. The method of claim 9, wherein said comparing the plurality of alternate translations to text segments in the monolingual corpus comprises inputting text segments in the monolingual corpus to the finite state acceptor.
- 11. The method of claim 1, further comprising:
generating a plurality of alternate translations for said input text string in a target language using a language model.
- 12. The method of claim 1, further comprising:
deriving the monolingual corpus from a collection of documents.
- 13. The method of claim 1, further comprising:
deriving the monolingual corpus from text on World Wide Web.
- 14. A method comprising:
receiving an input text string in a source language; building a finite state acceptor for the input text string operative to encode a plurality of alternative translations for said input text string in a target language; inputting text segments in a monolingual corpus in the to the finite state acceptor; and recording text segments accepted by the finite state acceptor; and recording a number of occurrences for each of said accepted text segments.
- 15. The method of claim 14, further comprising:
ranking each of the accepted text segments based at least in part on the number of occurrences.
- 16. The method of claim 15, further comprising:
identifying a probability for each of the accepted text segments; and ranking the accepted text segments at least in part on the probabilities.
- 17. An apparatus comprising:
a translation model component operative to receive an input text string in a source language and generate a plurality of alternate translations for said input text strings, the alternate translation comprising text segments in a target language; a corpus comprising a plurality of text segments in the target language; and a translation ranking module operative to record a number of occurrences of said alternate translations in the corpus.
- 18. The apparatus of claim 17, wherein said translation model component is operative to generate a finite state acceptor encoding the plurality of alternate translations.
- 19. The apparatus of claim 17, further comprising:
wherein the corpus comprises documents available on the World Wide Web.
- 20. An article comprising a machine-readable medium including machine-executable instructions, the instructions operative to cause the machine to:
receive an input text string in a source language; compare a plurality of alternate translations for said input text string in a target language to text segments in a monolingual corpus in the target language; and record a number of occurrences of each of at least a plurality of said plurality of alternate translations.
- 21. The article of claim 20, further comprising instructions operative to cause the machine to:
assign a probability score to said at least a plurality of said plurality of alternative translations.
- 22. An article comprising a machine-readable medium including machine-executable instructions, the instructions operative to cause the machine to:
receive an input text string in a source language; build a finite state acceptor for the input text string operative to encode a plurality of alternative translations for said input text string in a target language; input text segments in a monolingual corpus in the to the finite state acceptor; and record text segments accepted by the finite state acceptor; and record a number of occurrences for each of said accepted text segments.
- 23. The article of claim 22, further comprising instructions operative to cause the machine to:
rank each of the accepted text segments based at least in part on the number of occurrences.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application Serial No. 60/368,071, filed on Mar. 26, 2002, the disclosures of which are incorporated by reference.
ORIGIN OF INVENTION
[0002] The research and development described in this application were supported by DARPA under grant number N66001-00-1-8914. The U.S. Government may have certain rights in the claimed inventions.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60368071 |
Mar 2002 |
US |