1. Field of the Invention
The present invention relates to technologies for improving the accuracy of translation processing.
2. Description of the Related Art
With the arrival of the era of global communication, so-called machine translation has flourished wherein, using a computer, a text in a particular language is translated into another language by analyzing the structure of a document using dictionary data and a predetermined algorithm and replacing characters (phrases) with other characters (phrases).
When using machine translation, there is the advantage that translation processing can be performed for a large quantity of documents extremely quickly, but on the other hand there is the disadvantage that ordinarily, the quality of the documents after translation is not very high. In the translation processing stage, the translation style (for example, the dictionary data used and the translation processing algorithm) cannot be flexibly changed according to the content of the document (business document or technical document, etc.), and as a result, phrases of the source text are replaced in the text by inappropriate phrases.
The present invention has been made in view of the above circumstances, and provides a document processing device that can improve the quality of translation.
In order to address the issues described above, the present invention provides a translation processing method that includes: inputting a document; extracting characteristic information from the input document; selecting a translation style according to the characteristic information; and translating the input document using the selected translation style.
Embodiments of the present invention will be described in detail based on the following figures, wherein:
Below follows a description of an embodiment according to the present invention, with reference to the drawings.
The memory 11 is a storage device such as RAM, ROM, or a hard disk, and besides storing dictionary data or other reference data necessary when performing the processing described above in the control unit 10, it also stores a table Tc (details stated below) wherein document characteristic information is stored in correspondence with the document type, and a table Tr (details stated below) describing a translation style that should be applied for the identified document type.
The input unit 12 is a scanner device or the like that reads a manuscript printed on paper or the like as digital image data and supplies it to the control unit 10 and the memory 11. The operating unit 13 is an input device such as a keyboard or a mouse, with which the user of the document processing device 1 can designate a translation target document, various instructions related to registration of the translation style, and other necessary information. The input instructions and information are supplied to the control unit 10. The display unit 14 is constituted from a display device (not shown in the drawings) such as a graphics processor (not shown in the drawings) and liquid crystal display, and shows the document and messages to the user on a display under directions from the control unit 10. By inputting various instructions from the input unit 12 while looking at the display unit 14, the user causes the various processing described above to be executed by the document processing device 1. The output unit 15 is a printer for printing the manuscript after edit processing on paper or the like, a communications interface for performing appended information edit processing and supplying the obtained image data to a print device, a storage device for storing the document data on a storage medium such as flash memory or a CD-ROM, or the like.
Below, the successive flow of translation processing is explained using
Returning to
Returning to
Further, the processing of Steps S10 through S15 described above may be performed for other sample texts as necessary. As a result, for example, the characteristic information “objects such as solid lines and enclosing lines are compared to numerals and included in a predetermined ratio” and a document type name “chart, etc.” are associated and registered. In this way, the user repeatedly performs the processing of Steps S10 through S15 as necessary, for each of the document types that the user wants to register in the document processing device 1, and completes the registration operation. The user may also input the same type of sample document multiple times, and register the common characteristics of the characteristic information.
Next, the operation of the document processing device 1 when performing translation processing of the document will be explained.
Next, the type of document is specified in Step S25. Specifically, the type determination unit 105 compares the characteristic information extracted in Step S24 and all of the characteristic information registered in the memory 11. Then, the registered document type corresponding to the characteristic information with the greatest similarity is determined as the document type of the document. Then, referring to a table Tr, the translation style is determined according to the determined document type.
Next, translation processing is performed for the character information of the document, using the translation style designated in Step S26. The results of the translation are displayed in the display unit 14, and output as digital data according to predetermined instructions from the user or print out on paper or the like (Step S27).
In this way, according to the present embodiment, the document type is specified from the characteristics of the document that will be the translation target, after associating the document characteristics (characteristic information) with the document type and registering them in advance, and because the translation style most suitable for that document can be determined from the specified document type, it is possible to improve the quality of the translation.
The present invention is not restricted to the embodiment described above; various modifications are possible. Below, a modified embodiment is disclosed. In the embodiment described above, a translation style that includes information about a dictionary to be used and the like is determined when a document type is specified; however, it is not necessary to perform character recognition processing when a document type is determined; character recognition processing may be performed using a dictionary specified as a result of determination of a translation style. Because the accuracy of the character recognition processing may differ according to the dictionary that is used, by selecting the dictionary used when performing character recognition processing according to the document type in this way, it is possible to improve the accuracy of the extracted character recognition. Even in the case of performing character recognition processing as in the embodiment described above and determining a document type, character recognition processing may be performed again using the optimum dictionary determined from the identified document type. In this case, it is possible to further improve the character recognition accuracy.
Also, the content of the sample document and the characteristic information extracted from the sample document are not restricted to the items stated above. It is possible to read a sample document multiple times, extract common learned characteristic items, and register those items. Furthermore, instead of extracting characteristic information by scanning the document, it is also possible to determine a document type or translation style for the translation target, by storing a document template in the document processing device 1 as characteristic information and comparing the layout structure or the like of the document to be translated with the structure of the document template.
Also, when judging the similarity of the characteristic information with the type determination unit 105, all items of characteristic information may be used, or a portion of the items may be selected and used. The method of determining the accuracy of the registered characteristic information and the characteristic information of the text of the translation target, and the method that determines the document type from the similarity, are both optional. For example, it is possible to provide a threshold value for the similarity of each item, and judge that those items match when the threshold value is exceeded. It is also possible to confer a priority ranking to each document type, and when matching the characteristics of multiple document types, determine one document type according to the priority ranking. Also, it is possible to adopt a configuration wherein the user can freely rewrite the characteristic information used for registration processing of the document type.
With respect to the registration of the translation style (the type of dictionary used, etc.) as well, the content and designated method are optional. For example, the contents of the table Tr may be rewritable by the user. Furthermore, instead of having a user write to the table Tr, it is also possible in the document processing device 1 to extract nouns from the character information obtained by the character recognition processing, extract technical terminology included among those nouns using predetermined general dictionaries, associate the dictionary containing the greatest amount of that technical terminology with the document type of the document, and register that information. In this case, the time required for the user's registration operation is reduced.
In order to address the issues described above, the present invention provides a translation processing method that includes: inputting a document; extracting characteristic information from the input document; selecting a translation style according to the characteristic information; and translating the input document using the selected translation style. According to the method of the present invention, the quality of translation is improved because a suitable translation style is selected according to the type of document.
In an embodiment of the present invention, information related to the layout structure of the document is included in the characteristic information. Furthermore, specific character information is included in the characteristic information. Furthermore, the translation style is selected using a table defining a correspondence between the translation style and the characteristic information. Furthermore, the translation style designates a dictionary used in the translating step.
From another point of view, the present invention provides a document processing device including: an input section that inputs a document; an extracting section that extracts characteristic information from the input document; a select section that selects a translation style according to the characteristic information; and a translation section that translates the input document using the selected translation style.
From still another point of view, the present invention provides a storage medium readable by a computer, the storage medium storing a program of instructions executable by the computer to perform a function including: inputting a document; extracting characteristic information from the input document; selecting a translation style according to the characteristic information; and translating the input document using the selected translation style.
The foregoing description of the embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments, and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
The entire disclosure of Japanese Patent Application No. 2005-90202 filed on Mar. 25, 2005 including specification, claims, drawings and abstract is incorporated herein by reference in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2005-090202 | Mar 2005 | JP | national |