1. Field of the Invention
The present invention relates to a method for improving the quality of a machine translation.
2. Description of the Related Art
As a result of significant advances in global electronic communication, for machine translation from one language to another is increasing. A machine translation is performed by using a computer to replace character (words) with another character (words), by analyzing the characters and applying dictionary data or a predetermined algorithm to thereby translate from a specific language to a different language. If a text is not stored in a computer-readable format, in other words, if character information is not included in the text, prior to translation process, it is necessary to perform an OCR process for reading a printed text by a scanner device, to perform a character recognition process, and to extract character information.
One advantage of machine translation is that it is possible to translate a large amount of document in a short time; a disadvantage is that the quality of the translated document is usually of a relatively low standard. One reason for this disadvantage is that the machine translation process uses rules such as dictionary data or algorithms, and these rules are not flexibly adaptable depending on a type of a document to be translated; or example, a business document or a technical document. As a result, some of the translated words do not convey the original meaning. Therefore, to improve the quality of a machine-translated text it is necessary for a person to check the translated text and replace the unsuitable translated word to a suitable word. There exist several techniques for assisting a person related to correcting a machine-translated text. It is known to provide a technique wherein translations of specific words in an original text are displayed between the lines of the original text. It is also known to provide a technique wherein specific words in an original text and their translations are listed.
According to the techniques described above, it is possible to display on a screen an original text in contrast with machine-translated text, thereby making it easier for a person to rewrite a machine-translated text. However, a problem exist that it is necessary for a person to manually input suitable translations for every unsuitable translation. This problem reduces any advantage of performing a machine translation.
The present invention has been made in view of the above circumstances.
To address the stated problems described above, the present invention provides a translation processing method including: registering a type of annotation with a corresponding translation rule in a table; identifying a document to be processed; extracting an annotation added to a text element from the identified document; identifying a type of the extracted annotation added to the text element; and translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation.
Embodiments of the present invention will be described in detail based on the figures, wherein:
Refining next to the drawings, preferred embodiments of the present invention will be explained
Annotation recognition unit 102 performs a predetermined analysis process of image data of an area, excluding separated and extracted characters, to determine the type of annotation and the portion where the annotation is added (namely, elements that form a text such as a word and a term). The type of annotation that is extracted includes items such as a sticky tag, a moving border, an underline, a highlight, a leader line, and a note (words inserted between lines of an original text). Information relating to a type of annotation and a portion to which the annotation is to be added are stored in memory 11. Character recognition unit 103 performs a character recognition process on an area separated and extracted by document structure analysis unit 101 and extracts character information (a lexical token) to store them in memory 11. Translation processing unit 104 uses dictionary data stored in memory 11 and a predetermined algorithm to substitute character information extracted by character recognition unit 103 so as to perform a translation process in which the language of the document is translated to a language specified by a user. The text data being translated and the relations between the words in an original text and the words in translation are stored in memory 11.
For image data of a document to which annotation is added, document structure analysis unit 101, annotation recognition unit 102, character recognition unit 103, and translation processing unit 104 are used to perform a translation process for annotated and character portions; wherein, a function for extracting information relating to the type of the annotation, to words in an original text to which annotation is to be added, and to the translated words for each annotation is realized. Details of the process performed in control unit 10 will be given below. The functions of each unit realized in control unit 10 may be realized by each individual processor, or by one processor running a plurality of software applications.
Memory 11 is a storage device such as RAM, ROM, and hard disk; the memory stores dictionary database DB or other reference data used when performing the above process at control unit 10. As shown in
Input unit 12 refers to, for example, a scanner device which scans documents printed on paper as digital image data and provides the data to both control unit 10 and memory 11. Operation unit 13b refers to an input device such as a keyboard or a mouse; the operation unit is used when a user of document translation device 1 specifies a document to be translated, writes information in a dictionary table Tp and a translation rule table Tr, specifies a portion to be edited, or inputs any other necessary information. The input instruction or information is provided to control unit 10. Display unit 14 has a processor for drawing (not shown) and a display device such as a liquid crystal display (not shown); the display unit, when given an instruction from control unit 10, displays on a screen an original text, a document undergoing translation, or various types of messages for a user. A user refers to a display screen of display unit 14 and inputs instructions through input unit 12 so as to have document translation device 1 executing various processes. Output unit 15 is a printer for printing the edited script on paper, a communication interface for providing to a printing device text data acquired after additional information editing pr s have been performed, or a storage device for storing text data in a storage medium such as a flash memory or a CD-ROM.
Referring next to
Referring again to
More specifically, as shown in
Referring again to
The process then proceeds to step S23, wherein, a translation rule table Tr is referred to and the editing style corresponding to the identified annotation type is determined. In this step, when a note is identified in the table as an annotation, the document structure analysis unit refers to a dictionary table Tp to determine the dictionary corresponding to the character included in the note and the priority order for using each dictionary.
Refer again to
As described above, by using document translation device 1, a user confirms the translated document and corrects the mistranslated part by specie both the portion that is to be edited and the editing style, using an annotation. Thus, it is possible to acquire a translation with high quality in a short time, without placing an excessive burden on a user.
<Modifications>
The present invention is not limited to the embodiments described above, and may be modified in various ways. The modifications will be shown below. In the embodiments described above, a standard dictionary (English-Japanese dictionary 111) is used by document translation device 1 for performing a translation process (temporarily translation process) and a user specifies an editing object portion after checking the translation result; in another embodiment an annotation may also be added to an original text and the translation process may be performed on the basis of the annotation. Namely, the original text with an attached annotation is read by a scanner, and the type of the annotation and the portion to which the annotation is added are identified so that the translation style is determined (whether the original text is preferable, which dictionary is to be used, and a priority order) after referring to both translation rule table Tr and dictionary table Tp. In this embodiment, translation process is omitted one time; therefore, the present embodiment is more effective in a case that a user is able to predict the part where a mistranslation is likely to happen after checking the original text.
When adding an annotation to a temporally translated text, a document including the text may also be printed on such as a paper so that a user is able to write the annotation on the paper. In such a case, it is required to rescan the document with the annotation so that image data of the document is acquired.
Furthermore, in the embodiments described above, an editing (retranslation) process is performed after specifying every editing object portion; however, an editing process may also be performed each time an annotation is added to an editing object portion.
Needless to say, the contents of a document, the type of annotation, the specific wording of a note, and the dictionary used are not limited as in the case described above.
To address the stated problems described above, the present invention provides a translation processing method including: registering a type of annotation with a corresponding translation rule; identifying a document to be processed; extracting an annotation added to a text element from the identified document; identifying a type of the extracted annotation added to the text element; and translating the text element according to the registered translation rule corresponding to the identified type of the extracted annotation. According to an embodiment of the invention, a user specifies a part that is to be an edition object so that a desired translation rule is applied to the part at the time of translation, thereby improving the quality of translation.
In other embodiment, a translation processing method of the present invention wherein the type of annotation is registered with a corresponding translation rule in a table.
In an embodiment, the translation rule includes designation of a dictionary used in a translation process, or the dictionary is used according to a priority of the dictionary.
In an embodiment, the present invention provides a document translation device comprising: memory that stores a type of annotation with a corresponding translation rule in a table; identifying part that identifies a document to be processed; extracting part that extracts a type of annotation and character information from the document identified at the identifying part; annotation identifying part that identifies a text element to which the annotation extracted at the extracting step is to be added; translation rule determining part that determines a translation rule corresponding to the type of annotation by referring to the table; and translation performing part that translates the text element identified in the annotation identifying pan, by apply the translation rule determined at the translation rule determining part.
In an embodiment, the present invention provides a computer readable program that enable a computer to act as: a memory that stores a type of annotation with a corresponding translation rule; an identifying part that identifies a document to be processed; an extracting part that extracts an annotation added to a text element from the document identified by the identifying part; an annotation identifying part that identifies a type of the annotation added to the text element extracted by the extracting part; and translation performing part that translates the text element according to the translation rule corresponding to the type of the annotation identified by the annotation identifying part.
The foregoing description of the embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments, and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and heir equivalents.
The entire disclosure of Japanese Patent Application No. 2005-90203 filed on Mar. 25, 2005 including specification, claims, drawings and abstract is incorporated herein by reference in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2005-090203 | Mar 2005 | JP | national |