1. Field of the Invention
The present invention relates to a document processing device that reads, translates, and outputs a document.
2. Description of the Related Art
In order to achieve the efficient usage of foreign language documents, devices have been developed that machine translate and output documents.
In the devices, the translation of only a portion of the document can be used as an abstract of the document, or as an index. However, because the information included before or after the extracted portion is omitted, when translated as-is, the results of the translation may be lack a comprehensible meaning.
The present invention was made in view of the above circumstances and provides a document processing device that, even when a portion of a document is translated, can provide a translation having a comprehensible meaning.
In order to address the issues described above, the present invention provides, in one aspect, a document processing device that has a translation section that translates character data included in a designated area of a manuscript; and a replacing section that when the translated character data contains a reference term that refers to a target term that is not specified in the translated character data, replaces the reference term in the translated character data with a translation of the target term existing in an area of the manuscript other than the designated area.
With the document processing device according to the present invention, even when designating a portion of a document and performing translation work, it is possible to automatically search for required information and output a translated document with a high degree of completeness.
Embodiments of the present invention will be described in detail based on the following figures, wherein:
Below follows a description of an embodiment of the present invention, with reference to the drawings.
The reading section 10, for example, is publicly known technology that, while moving the document along the reading face of the reading device, converts the brightness of each part of the document to binary image data, and ordinarily includes a hardware portion called a scanner that has an automatic paper feed mechanism. The area extraction section 12 extracts a portion of the image data, reflecting in some form the intent of a user. In this embodiment, a user interface 22 is provided in order for a person to give an instruction for the area extraction section 12. This is performed, for example, by the area extraction section 12 displaying the image data obtained by the reading section 10 on a display, and the user designating an area on the display using a mouse or the like. A suitable configuration can be adopted for the user interface 22, such as a keyboard, touch panel, or the like, and if there is an existing configuration in the document processing device, that may also be used.
And, for example, it is also possible to indicate an extraction area by the user directly writing a border into the document. In this case, by having a function that directly judges that border in the area extraction section 12, the user interface 22 is unnecessary. This method conveniently saves the time needed to process a large amount of documents, because when a user takes a copy of an original document and writes a border into that copy, afterwards the device will process the document automatically.
The character recognition section 14 performs character recognition of the image data in the language of the source document designated in advance, and generates character data of the document. The translation section 16 is a conventional translation section that refers to a dictionary database, which is a corresponding table of the translation source language and the translation target language, and performs translation. The output section 20 may appropriately select a printer, display, or memory section. When the source document includes graphic information other than text, such as graphics, photographs, and the like, the output section 20 may recombine the translation results with the graphic information and output the recombined data.
The content checking section 18 retrieves reference terms from the content of the translation results. The content checking section 18 has a reference term database wherein these sorts of reference terms are stored beforehand, in a table format as shown in
The candidate terms in the column of the search target term of the table TBL shown in
Also, if multiple candidates appear when a search is performed, one of the candidates is selected by a rule determined in advance. This rule is determined such that the term at the position closest to the reference term (position in the text passage) is retrieved, or the like. And, this rule may be used in combination with a rule that confers a frequency of occurrence to each term and establishes a priority, or the like.
Conceptual terms such as “multiple people”, “multiple objects”, and “multiple animals” are set as target terms for “they” shown in
The operation of this embodiment will be explained below.
A manuscript is read by the reading section 10 (Step 1), and the area extraction section 12 checks whether or not there is a portion designation (Step 2). When a portion is designated by marking the manuscript, the presence or absence of a portion designation is judged on the image data. In a system wherein a user individually makes a designation for the image data, document image data is opened on a display or the like, the user is prompted to designate an area, and the designation is judged according to the response of the user. When there is no portion designation, the character recognition section 14 and the translation section 16 operate as usual, the entire area is translated (Step 3) and the output section 20 outputs the results (Step 4).
When it is judged in Step 2 that there is a portion designation, the area extraction section 12 extracts that designated area (Step 5), and performs character recognition and translation (Step 6). Next, the content checking section 18 checks whether or not there are reference terms in the results of the translation (Step 7). This is performed with reference to the left column of the table shown in
In the embodiment shown in
In Step 11, if there is a target term in the expanded area, that portion is translated, the translation of the target term is replaced with the corresponding reference term translation (Step 12), and the result is output (Step 4). In the example shown in
In Step 11, when there is no target term in the expanded area, the possibility of further expansion is judged (Step 13), and when expansion is possible, the procedure returns to Step 9 and the steps through Step 11 are repeated. When there is no space to expand in the manuscript, the results are output with the reference term remaining as-is (Step 4). In this case, it is possible to output the results with a comment attached stating that the reference term content is unclear, and provide a warning to this effect by a separate method (such as a display by a display section or audio guidance using a speech synthesis device). A user can adopt a policy of supplying the previous page to the reading section or the like in response to such a warning. And, when designating a portion and translating in this way, because it is possible that there is necessary information on the pages before and after the designated portion, it is also possible to initially include the pages before and after the designated portion when reading the document.
In the above embodiment, the reference term is a pronoun, and words mentioned earlier in the text are searched, but among the reference terms there are also cases when the target term is explained after the reference term, as in “X as described below”. In such a case, the searched target term is “X” itself, and when replacing the search results, the replacement also includes that explanation.
In this embodiment, the presence or absence of a reference term is checked after translation is performed, but this may also be checked in the original text. In that case, all of the work of the content checking section 18 is performed in the language of the translation source, including the replacement in Step 12 of
As described above, the present invention provides, in one aspect, a document processing device that has a translation section that translates character data included in a designated area of a manuscript; and a replacing section that when the translated character data contains a reference term that refers to a target term that is not specified in the translated character data, replaces the reference term in the translated character data with a translation of the target term existing in an area of the manuscript other than the designated area.
As described above, the present invention also provides, in one aspect, a document processing device that has a replacing section that when character data included in a designated area of a manuscript contains a reference term that refers to a target term that is not specified in the character data, replaces the reference term in the character data with the target term existing in an area of the manuscript other than the designated area; and a translation section that translates the character data included in the designated area.
According to one of foregoing embodiments of the invention, the designated area may be designated by markings on the manuscript. According to one of foregoing embodiments of the invention, the document processing device may further comprise an input section for a user to designate the designated area.
According to one of foregoing embodiments of the invention, when the target term is not specified, the translated character data containing a message that the target term is not specified may be outputted. According to one of foregoing embodiments of the invention, the document processing device may further comprise a warning section that provides a warning to a user when the target term is not specified. Further, according to one of foregoing embodiments of the invention, the target term may be specified using a table defining a correspondence between the target term and the reference term.
The present invention also provides, in one aspect, a method of processing character data that has translating character data included in a designated area of a manuscript; and replacing, when the translated character data contains a reference term that refers to a target term that is not specified in the translated character data, the reference term in the translated character data with a translation of the target term existing in an area of the manuscript other than the designated area.
The present invention also provides, in one aspect, a method of processing character data that has replacing, when character data included in a designated area of a manuscript contains a reference term that refers to a target term that is not specified in the character data, the reference term in the character data with the target term existing in an area of the manuscript other than the designated area; and translating the character data included in the designated area.
The present invention also provides, in one aspect, a computer readable recording medium recording a program that causes a computer to execute one of the foregoing methods.
The foregoing description of the embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
The entire disclosure of Japanese Patent Application No. 2005-090174 filed on Mar. 25, 2005 including specification, claims, drawings and abstract is incorporated herein by reference in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2005-090174 | Mar 2005 | JP | national |