1. Technical Field
The present invention relates to technology for translating a document from one language to another by a computer.
2. Related Art
In recent years, translation devices are being used that convert a document from one language to another. Particularly, devices are being developed in which, when a translation source document (manuscript) has been provided as a paper document, the paper document is optically read and digitized, and after performing character recognition, automatic translation is performed (for example, JP H08-006948A).
When using a device as described above that performs automatic translation, it is necessary for a user to specify languages by inputting (or selecting) a translation source language and a translation destination language to that device. Such an input operation is often complicated, and there is the problem that when, for example, the user does not use the device on a daily basis, that input operation takes time and the user's work efficiency is decreased. In order to respond to such a problem, devices have been developed in which a message that prompts the user for operation input or the like is displayed on a liquid crystal display or the like, but even in this case, there is the problem that when, for example, the message is displayed in Japanese, a user who cannot understand Japanese cannot understand the meaning of the message that is displayed, and it is difficult to perform the input operation.
A document processing device comprises: an image capturing unit that captures an image from sheet-like media, and acquires image data that represents the image as a bitmap; a region separating section that extracts from the image data, image data of a printed region and image data of a hand-drawn region; a printed text data acquiring section that acquires printed text data that represents the contents of printed characters in the printed region; a hand-drawn text data acquiring section that acquires hand-drawn text data that represents the contents of hand-drawn characters in the hand-drawn region; a printed language specifying section that specifies the language of the printed text data; a hand-drawn language specifying section that specifies the language of the hand-drawn text data; a translation processing section that generates translated text data by translating the printed text data to the language that has been specified by the hand-drawn language specifying section.
Embodiments of the present invention will be described in detail based on the following figures, wherein:
Embodiment 1
Following is a description of a first embodiment of the present invention. First, the main terminology used in the present embodiment will be defined. The term “printed character” section a character obtained by transcribing a character shape of a specified typeface such as Gothic or Mincho, and the term “hand-drawn character” is used to mean a character other than a printed character. Further, the term “document” is used to mean a sheet-shaped medium (such as paper, for example) on which information is written as character orthography. Hand-drawn characters that pertain to the handling or correction of a passage written with printed characters and have been added by a person who has read that passage, are referred to as “annotation”.
Next is a description of the configuration of a multifunctional machine 1 of the present embodiment, with reference to the block diagram shown in
An image capturing unit 13 optically scans a document and captures an image of that document. This image capturing unit 13 is provided with a loading unit in which a document is loaded, and captures an image of a document that has been loaded in this loading unit by optically scanning the document, and generates binary bitmap image data. An image forming unit 14 prints image data on paper. Based on the image data supplied by the control unit 11, the image forming unit 14 irradiates image light and forms a latent image on a photosensitive drum not shown in the figure due to a difference in electrostatic potential, makes this latent image a toner image by selectively affixing toner, and forms an image on the paper by transferring and affixing that toner image.
A display 15 displays an image or the like that shows a message or work status to a user, according to a control signal from the control unit 11, and is configured from a liquid crystal display or the like, for example. An operating unit 16 outputs a signal corresponding to the user's operation input and the on-screen display at that time, and is configured from a touch panel or the like in which a ten key, start button, and stop button are placed on a liquid crystal display. By the user operating the operating unit 16, it is possible to input an instruction to the multifunctional machine 1. A communications unit 17 is provided with various signal devices, and gives and receives data to and from other devices under the control of the control unit 11.
Operation of the present embodiment will now be described. First, the user of the multifunctional machine 1 inputs a translation instruction by operating the operating unit 16. Specifically, the user loads a document that will be the target of translation processing in the loading unit of the image capturing unit 13, and inputs a translation instruction to the multifunctional machine 1 by operating the operating unit 16.
Next, the control unit 11 extracts image data of a region in which printed characters are written (hereinafter, referred to as a “printed region”) and a region where hand-drawn characters are written (hereinafter, referred to as a “hand-drawn region”) from the image that has been generated, and separates the image data of the printed region and the image data of the hand-drawn region (Step S3).
Extraction of image data is performed as follows. First, pixels represented by the image data of the document are scanned in the horizontal direction, and when the distance between two adjacent characters, that is, the width of a line of continuous white pixels, is less than a predetermined value X, those continuous white pixels are replaced with black pixels. This predetermined value X is made roughly equal to a value assumed to be the distance between adjacent characters. Likewise, the pixels are also scanned in the vertical direction, and when the width of a line of continuous white pixels is less than a predetermined value Y, those continuous white pixels are replaced with black pixels. This predetermined value Y is made roughly equal to a value assumed to be the interval between lines of characters. As a result, a region is formed that has been covered with black pixels.
When a region that has been covered by black pixels is formed, the operation proceeds to judge whether each region is a printed region or a hand-drawn region. Specifically, first a noted region that will be the target of processing is specified, that black pixels that have been substituted within the specified region are returned to white pixels, and the contents of the original drawing are restored. Then, the pixels within that region are scanned in the horizontal direction, and it is judged whether or not the degree of variation in pitch of continuous white pixels is less than a predetermined value. Ordinarily, for a region in which printed characters have been written, the degree of variation in pitch of continuous white pixels is less than the predetermined value because the interval between two adjacent characters is about constant. On the other hand, for a region in which hand-drawn characters have been written, the degree of variation in pitch of continuous white pixels is larger than the predetermined value because the interval between two adjacent characters is not constant. When a judgment has been performed for the regions L1 to L3 shown in
Following is a return to the description of
Next, the control unit 11 generates hand-drawn text data from the image data of the hand-drawn regions that represents the contents of the hand-drawn characters (Step S5). In this step the acquisition of hand-drawn text data is performed as follows. First, character images are extracted from the image data by a character and normalized. Then, the characteristics of each constituent element of the characters are extracted from the normalized image, and by comparing those extracted characteristics to characteristic data that has been prepared in advance as a dictionary, constituent elements of characters are determined. Further, character codes of characters that have been obtained by assembling the determined constituent elements in their original manner are output.
Next, the control unit 11 specifies the language of the printed text data (Step S6). Specifically, the control unit 11 searches for a predetermined word(s) included in this printed text data, the words being unique to each language prepared in advance as a dictionary. The language of the searched words is specified to be the language of the printed text data. A language is specified in the same manner for the hand-drawn text data (Step S7).
The control unit 11 judges that the language of the printed text data is the translation source language, and that the language of the hand-drawn text data is the translation destination language, and generates translation text data by translating the printed text data from the translation source language to the translation destination language (Step S8). Then, the translation text data that shows the results of translating the printed text data and the hand-drawn text data is output and printed on paper by the image forming unit 14 (Step S9).
According to the present embodiment described above, when the multifunctional machine 1 reads a document to which an annotation has been added, the multifunctional machine 1 separates image data from that document into image data of a region in which printed characters have been written and image data of a region in which hand-drawn characters have been written, and acquires text data from each of the separated image data. Then, language judgment processing is performed for each of that data, so that a translation source language and translation destination language can be specified. As a result, if a user of the multifunctional machine 1 does not input a translation source language or translation destination language into the multifunctional machine 1, an original text is translated into a desired language by only performing just a simple operation of inputting translation instructions.
Embodiment 2
Following is a description of a second embodiment of the present invention. The hardware configuration of the multifunctional machine 1 of the present embodiment is the same as the first embodiment, except for storing a comparison image table TBL (shown by a dotted line in
The data structure of the comparison image table TBL is shown in
Next is a description of the operation of the present embodiment. First, the user of the multifunctional machine 1 inputs a translation instruction by operating the operating unit 16. Specifically, the user loads a document that will be the target of translation processing along with their own passport (distinctive image) in the loading unit of the image capturing unit 13, and inputs a translation instruction to the multifunctional machine 1 by operating the operating unit 16.
Next, the control unit 11 performs layout analysis or the like using a predetermined algorithm or the like for image data, and extracts character region image data and passport image region image data (distinctive image region) (Step S13). Specifically, image data is divided into predetermined regions, and the types of the regions (such as character or drawing) is judged (Step S13). In the example shown in
Next, the control unit 11 generates text data from the image data of the character region (Step S14), and specifies the language of the generated text data (Step S15). This processing is performed in the same manner as the first embodiment. Next, the control unit 11 compares the image data of the distinctive image region extracted in Step S13 and the passport image data stored in the comparison image table TBL, and specifies a translation destination language based on the degree of agreement of that image data (Step S16).
The control unit 11 judges that the language of the text data is the translation source language and the language that has been specified from the passport image data (distinctive image data) is the translation destination language, translates the text data from the translation source language to the translation destination language, and generates translated text data (Step S17). Then, the translation text data that shows the results of translating the text data is output and printed on paper by the image forming unit 14 (Step S18).
According to the present embodiment described above, when the multifunctional machine 1 reads a document and a distinctive image that specifies a language (passport image), the multifunctional machine 1 separates image data of a region in which characters have been written and image data of a region in which a distinctive image has been formed, specifies the translation destination language from the image data of the distinctive image and acquires text data from the image data of the region in which characters have been written, and specifies the language of that text data. In other words, it is possible to respectively specify the translation source language from the text data and the translation destination language from the image data of the distinctive image. As a result, if a user of the multifunctional machine 1 does not input a translation source language or translation destination language into the multifunctional machine 1, an original document is translated into a desired language by performing just a simple operation of inputting translation instructions, while improving work efficiency of the user.
Embodiment 3
Following is a description of a third embodiment of the present invention. The hardware structure of the multifunctional machine 1 of the present embodiment is the same as the first embodiment, except for being provided with a microphone 19 (shown by a dotted line in
Following is a description of the operation of the present embodiment. First, a user of the multifunctional machine 1 inputs a translation instruction by operating the operating unit 16 of the multifunctional machine 1. Specifically, the user inputs a translation instruction to the multifunctional machine 1 by putting a document that will be the target of translation processing on the loading unit of the image capturing unit 13 of the multifunctional machine 1 and operating the operating unit 16, and pronounces some words of the translation destination language toward the microphone 19.
Next, the language of the audio data generated by Step S22 is determined (Step S26). This determination is performed as follows. A control unit 21 searches for the a predetermined word(s) unique to each language that have been prepared in advance as a dictionary, and determines the language having the searched word(s) to be the language of the audio data. It is preferable that the predetermined word(s) is selected among words of frequent use, such as “and”, “I”, or “we” in the case of English, or conjunctions, prefixes and the like.
The control unit 11 judges the language of the text data to be the translation source language and the language that has been specified from the audio data to be the translation destination language, translates the text data from the translation source language to the translation destination language, and generates translated text data (Step S27). Then, the translated text data is output and the translated text is printed on paper by the image forming unit 14 (Step S28).
According to the present embodiment described above, text data is obtained from the image data of the document, the language of that text data is specified, and the translation destination language is specified from the audio data that represents the audio that has been gathered. In this manner, if the user of the multifunctional machine 1 does not input a translation source language or translation destination language into the multifunctional machine 1, an original text is translated into a desired language by performing just a simple operation of inputting translation instructions and audio, while improving the work efficiency of the user.
Embodiment 4
Following is a description of a fourth embodiment of the present invention.
Next is a description of the configuration of the audio recorder 2 with reference to the block diagram shown in
A display 25 displays an image or the like that shows a message or work status to a user, according to a control signal from the control unit 21. An operating unit 26 outputs a signal corresponding to the user's operation input and the on-screen display at that time, and is configured by a start button, a stop button and the like. It is possible for a user to input instructions to the audio recorder 2 by operating the operating unit 26 while looking at an image or message displayed in the display 25. A communications unit 27 includes one or more signal processing devices or the like, and gives and receives data to and from the multifunctional machine 1 under the control of the control unit 21.
A barcode output unit 24 outputs a barcode by printing it on paper. The control unit 21 specifies a language by analyzing audio data with a predetermined algorithm, and converts information that represents the language that has been specified to a barcode. The barcode output unit 24 outputs this barcode by printing it on paper under the control of the control unit 21.
Next is a description of the configuration of the computer device 3 with reference to the block diagram shown in
Next is a description of the operation of the present embodiment. In the following description, audio data that is generated by a user's voice explaining the importance, the general outline of the document, or other information on the document is referred to as “audio annotation”.
First, the operation in which the audio recorder 2 generates an audio annotation will be explained with reference to the flowchart in
When a language is specified, the control unit 21 of the audio recorder 2 converts information that includes the specified language and an ID (identifying information) for that audio annotation to a barcode, and allows that barcode to be output by the barcode output unit 24 by printing that barcode on paper (Step S36).
An audio annotation and a barcode that represents the audio annotation are generated by the above processing. The user of the audio recorder 2 attaches the barcode that has been output to a desired location of the document.
Next is a description of the operation of the multifunctional machine 1. First the user of the multifunctional machine 1 inputs a translation instruction by operating the operating unit 16 of the multifunctional machine 1 and the operating unit 26 of the audio recorder 2. Specifically, the user inputs a send instruction to send the audio annotation to the multifunctional machine 1 by operating the operating unit 26 of the audio recorder 2, and inputs a translation instruction to the multifunctional machine 1 by putting a document that will be the target of translation processing on the loading unit of the image capturing unit 13 of the multifunctional machine 1 and operating the operating unit 16.
In the second embodiment, image data of the distinctive image region that is extracted in Step S13 of
Next, the control unit 11 judges the language of the text data to be the translation source language, and the language that has been specified from the barcode (distinctive image data) to be the translation destination language, and generates translated text data by translating the text from the translation source language to the translation destination language (Step S17). Next, the audio annotation received from the audio recorder 2 is linked to the translated text data (Step S19), and output by sending it to the computer device 3 via the communication unit 17 (Step S18′). Accordingly, the translated text data to which the audio annotation has been added is sent to the computer device 3.
Next, the user operates the computer device 3 to display the translated text data received from the multifunctional machine 1 on the display 35. When the control unit 31 of the computer device 3 detects that a command to display the translated text data has been input, the translated text data is displayed on the display 35.
According to the present embodiment as described above, when the multifunctional machine 1 reads a document and a distinctive image that specifies a language (a barcode), the multifunctional machine 1 separates image data from that document into image data of a region in which printed characters have been written and image data of a region in which a distinctive image has been formed, specifies a translation destination language from the image data of the distinctive image, acquires text data from the image data of the region in which printed characters have been written, and specifies a language for that text data. Namely, a translation source language can be specified from the text data, and a translation destination language can be specified from the image data of the destination image. By adopting such a configuration, if the user of the multifunctional machine 1 does not input a translation source language or translation destination language into the multifunctional machine 1, an original text is translated into a desired language by performing just a simple operation of inputting a translation instruction, and thus the work efficiency of the user is improved.
In the embodiment described above, an operation is described that translates a document to which one barcode has been added, but as shown for example by dotted line F in
Embodiments of the present invention are described above, but the present invention is not limited to the aforementioned embodiments, and can be embodied in various other forms. Examples of such other forms are given below.
Likewise with respect to the second through fourth embodiments, a configuration may also be adopted in which a plural number of two or more devices connected by a communications network share the functions of those embodiments, and a system provided with that plural number of devices allows the multifunctional machine 1 of those embodiments to be realized. For example, with respect to the second embodiment, a configuration may also be adopted in which a dedicated server device that stores the comparison image table TBL is provided separate from the multifunctional machine, and the multifunctional machine makes an inquiry to that server device for the results of specifying a language.
The distinctive image may also be, for example, a logo, a pattern image, or the like. A configuration may also be adopted in which, even when a logo, a pattern image, or the like are used as the distinctive image, image data for comparison is stored in the comparison image table TBL, same as in the above embodiment, and a translation destination language is specified by matching image data, or a translation destination language is specified using a predetermined algorithm for analyzing those pattern images or the like.
In the second embodiment, a configuration is adopted in which the multifunctional machine 1 simultaneously scans a document and a distinctive image that specifies a language, and image data of a character region and image data of a distinctive image region are extracted from the generated image data. However, a configuration may also be adopted in which the document and the distinctive image are separately scanned, and the image data of the document and the image data of the distinctive image are separately generated. For example, a configuration may be adopted in which a distinctive image input unit (loading unit) that inputs a distinctive image such as a passport or the like is provided separately from a document image input unit (loading unit), and the user inputs the distinctive image from the distinctive image input unit.
As described above, the present invention provides a document processing device that includes an image capturing unit that captures an image from sheet-like media, and acquires image data that represents the image as a bitmap, a region separating section that extracts from the image data image data of a printed region in which printed characters are written and image data of a hand-drawn region in which hand-drawn characters are written, a printed text data acquiring section that acquires printed text data that represents the contents of printed characters in the printed region from the image data of the printed region, a hand-drawn text data acquiring section that acquires hand-drawn text data that represents the contents of hand-drawn characters in the hand-drawn region from the image data of the hand-drawn region, a printed language specifying section that specifies the language of the printed text data, a hand-drawn language specifying section that specifies the language of the hand-drawn text data, a translation processing section that generates translated text data by translating the printed text data from the language that has been specified by the printed language specifying section to the language that has been specified by the hand-drawn language specifying section, and an output unit that outputs the translated text data.
According to this document processing device, image data of a region in which printed characters have been written and image data of a region in which hand-drawn characters have been written is separated from the document, and text data is individually acquired from the respective image data that has been separated. By specifying languages for the respective image data, it is possible to specify a translation source language and a translation destination language.
Also, the present invention provides a document processing device that includes an image capturing unit that captures an image from sheet-like media, and acquires image data that represents the image as a bitmap, a region separating section that extracts from the image data image data of a character region in which characters are written and distinctive image data of a distinctive image region in which a distinctive image is formed that specifies a language, a text data acquiring section that acquires text data that represents the contents of characters in the character region from the image data of the character region, a character language specifying section that specifies the language of the text data, a translation destination language specifying section that specifies a translation destination language by analyzing the distinctive image data of the distinctive image region with a predetermined algorithm, a translation processing section that generates translated text data by translating the text data from the language that has been specified by the character language specifying section to the translation destination language, and an output unit that outputs the translated text data.
According to this document processing device, image data of a region in which a distinctive image is formed that specifies a language and image data of a region in which characters are written is separated from the document, a translation destination language is specified from the image data of the distinctive image, text data is acquired from the image data of the region in which characters are written, and the language of that text data is specified. That is, it is possible to respectively specify the translation source language from the text data and the translation destination language from the image data of the distinctive image.
Also, the present invention provides a document processing device that includes an image capturing unit that captures an image from sheet-like media, and acquires image data that represents the image as a bitmap, a distinctive image capturing unit that scans a distinctive image that specifies a language, and acquires distinctive image data that represents the contents of the distinctive image as a bitmap, a text data acquiring section that acquires text data that represents the contents of characters from the image data, a character language specifying section that specifies the language of the text data, a translation destination language specifying section that specifies a translation destination language by analyzing the distinctive image data with a predetermined algorithm, a translation processing section that generates translated text data by translating the text data from the language that has been specified by the character language specifying section to the translation destination language, and an output unit that outputs the translated text data.
According to this document processing device, the translation destination language is specified from the image data of the distinctive image, text data is acquired from the image data of the document, and the language of that text data is specified. That is, it is possible to respectively specify the translation source language from the text data and the translation destination language from the image data of the distinctive image.
In an embodiment of the present invention, a configuration may be adopted in which a storage unit is provided that stores multiple sets of comparison image data, the translation destination language specification unit compares the distinctive image data with the comparison image data that has been stored in the storage units, and the translation destination language is specified based on the degree of agreement between the distinctive image data and the comparison image data.
Also, in another embodiment of the present invention, a configuration may be adopted in which the comparison image data is image data that shows an image of at least one of a passport, currency (a coin, a banknote, etc.), or barcode.
Also, the present invention provides a document processing device that includes an image capturing unit that captures an image from sheet-like media, and acquires image data that represents the image as a bitmap, a text data acquiring section that acquires text data that represents the contents of characters from the image data, a character language specifying section that specifies the language of the text data, an audio input section that picks up a sound to generate audio data, a translation destination language specifying section that specifies a translation destination language by analyzing the audio data with a predetermined algorithm, a translation processing section that generates translated text data by translating the text data from the language that has been specified by the character language specifying section to the translation destination language, and an output unit that outputs the translated text data
According to this document processing device, text data is acquired from the image data of the document, the language of that text data is specified, and a translation destination language is specified from the audio data of audio that has been collected. It is possible to respectively specify the translation source language from the text data and the translation destination language from the audio data.
According to an embodiment of the present invention, it is possible to perform translation processing by judging a translation destination language without a user inputting a translation destination language.
The foregoing description of the embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments are chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
The entire disclosure of Japanese Patent Application No. 2005-175615 filed on Jun. 15, 2005, including specification claims, drawings and abstract is incorporated herein by reference in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2005-175615 | Jun 2005 | JP | national |