This Application claims priority of Taiwan Patent Application No. 97145471, filed on Nov. 25, 2008, the entirety of which is incorporated by reference herein.
1. Field of the Invention
The invention relates generally to a translation method and apparatus and storage media using the same, and more particularly, to a translation method and apparatus and storage media using the same for cross-language information retrieval.
2. Description of the Related Art
With increased internet access, information retrieval via the internet has grown in popularity. Accordingly, cross-language information retrieval has also grown in popularity. For cross-language information retrieval, one conventional method is for manual translation of information in advance and another conventional method is for key term translation of information.
While manual translation of information in advance results in better quality translations, feasibility due to high costs hinders usage. Meanwhile, key term translation of information, while more feasible than manual translations, is characterized by lower quality translations and decreased usefulness.
The invention discloses an information retrieval translation method for translating a plurality of Chinese terms comprising a first Chinese term and a second Chinese term. The information retrieval translation method comprises comparing the first Chinese term with a plurality of first indices stored in a first language database, wherein the first language database has a plurality of first translation terms corresponding to the first indices. Additionally, the corresponding first translation term for the first index which corresponds to the first Chinese term is acquired. Also, the second Chinese term with a plurality of second indices stored in a second language database is compared, wherein the second language database has a plurality of second translation terms corresponding to the second indices. Moreover, the corresponding second translation term for the second index which corresponds to the second Chinese term is acquired.
Furthermore, the invention discloses an information retrieval translation apparatus for translating a plurality of Chinese terms comprising a first Chinese term and a second Chinese term. The information retrieval translation apparatus comprises a first language database, a second language database, a comparison module and a translation term acquisition module. The first language database stores a plurality of first indices and a plurality of first translation terms corresponding to the first indices. The second language database stores a plurality of second indices and a plurality of second translation terms corresponding to the second indices. The comparison module compares the first Chinese term with the first indices, and the second Chinese term with the second indices. The translation term acquisition module acquires the corresponding first translation term for the first index which corresponds to the first Chinese term, and the corresponding second translation term for the second index which corresponds to the second Chinese term.
Furthermore, the invention discloses a storage medium for storing an information retrieval translation program, wherein the information retrieval translation program comprises a plurality of program codes to be loaded onto a computer system so that an information retrieval translation method for translating a plurality of Chinese terms comprising a first Chinese term and a second Chinese term may be executed by the computer system. The information retrieval translation method comprises comparing the first Chinese term with a plurality of first indices stored in a first language database, wherein the first language database has a plurality of first translation terms corresponding to the first indices. Additionally, the corresponding first translation term for the first index which corresponds to the first Chinese term is acquired. Also, the second Chinese term with a plurality of second indices stored in a second language database is compared, wherein the second language database has a plurality of second translation terms corresponding to the second indices. Moreover, the corresponding second translation term for the second index which corresponds to the second Chinese term is acquired.
The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
nai zhen ping gu bu qiang gong zuo zhi
kao liang , ying jian li yi chu bu ping
Next, the stop word removal module 13 removes the stop words from the Table 1 (step S22). The stop words refer to as the unimportant terms and punctuation marks, such as “ji “zhi” “yi” “yi (AA)” ” and “ying. Based on this, the remaining Chinese terms may be seen as Table 2 below:
jian li chu bu ping gu fang fa zuo wei
chu bu shai xuan you xian jin xing nai
The content of Table 2, is next utilized to apply the information retrieval translation method of the invention. The first language database 14 is first used to translate the content of Table 2. The first language database 14 may be a general dictionary for general translations rather than professional dictionary for professional translations. In addition, the first language database 14 stores a plurality of first indices and a plurality of first translation terms corresponding to the first indices. For example, a first index may be “jian li whereas a translation term corresponding to the first index may be “establish”, “create” or “build”. Note “jian li” is merely a phonetic transcription (pinyin) for the Chinese characters (jian li)”, and not an English translation, which is “establish”, “create” or “build”.
Following, the comparison module 16 compares each Chinese term of Table 2 with the first indices stored in the first language database 14 (general dictionary) (step S23). If a first index is found corresponding to the Chinese term of Table 2, the translation term acquisition module 17 acquires the first translation term corresponding to the first index (step S24).
Through the processing of steps S23 and S24, the result may be seen as the Table 3 below:
As seen in Table 3, the remaining Chinese terms were not translated. Therefore, a professional dictionary (second language database 15) is used for a better quality translation.
Following, the comparison module 16 compares the Chinese terms that were not translated with the second indices stored in the second language database 15 (professional dictionary) (step S25). Note that the second language database 15 also stores a plurality of second indices and a plurality of second translation terms corresponding to the second indices. Following step S25, if a second index is found corresponding to the Chinese terms that were not translated, then the translation term acquisition module 17 acquires the corresponding second translation term stored in the second language database 15 (step S26). With steps S25 and S26, the Chinese term “bu qiang of Table 3 may be translated as “reinforcement”. However, some Chinese terms may still not be translated, such as “ji yu, “bian lie, “nai zhen and “xiao she”. Thus, manual translation is applied via an input interface (not shown), such as a keyboard or a mouse etc (step S27). Detailed description of the step S27 is explained with reference to
The content of Table 3 may be translated as Table 4 using the rule introduced in
When compared to a translation result using only manual translation: “when considering costs and expedience of assuring seismically standard school buildings, a preliminary seismic evaluation should first be conducted to prioritize the retrofitting of school buildings”, despite differences in the quality of translation as illustrated in Table 4, listing of the key terms for cross-language information retrieval is achieved, thus providing substantially the same performance as the manual translation for information retrieval.
Note that during application, training of key term(s) is applied in the information retrieval translation method of the invention to achieve more expedient cross-language information retrieval.
Note that in step S273, the translation terms of the unimportant Chinese terms are directly replaced with the punctuation mark “;” without translation and these Chinese terms are stored in the professional dictionary. Thus, training of the professional dictionary is achieved, decreasing time required for future processing. Similarly, in step S274, the translation terms obtained from manual translation will also be stored in the professional dictionary for training purposes (step S275). Thus, the translation for the same Chinese term may be directly obtained from the professional dictionary without repeated manual translations, thus decreasing future requirement for manual translations and costs and increasing quality of translations.
In addition, the information retrieval translation method can be recorded as a program in a storage medium for performing the above procedures, such as an optical disk, floppy disk and portable hard drive and so on. It is to be emphasized that the information retrieval translation method program is formed by a plurality of program codes corresponding to the procedures described above.
While the invention has been described by way of example and in terms of the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
TW97145471 | Nov 2008 | TW | national |