The present application claims priority under 35 U.S.C. § 119 to Japanese Patent Application No. 2016-149068, tiled on Jul. 28, 2016. The contents of this application are incorporated herein by reference in their entirety.
The present disclosure relates to an image forming apparatus for digitizing a document based on a markup document, which is an original document modified by handwriting. The present disclosure also relates to a storage medium and a method for digitizing a document.
An existing document editing device digitizes a document based on a markup document, which is an original document modified by handwriting.
An image forming apparatus according to an aspect of the present disclosure includes a central processing unit (CPU), a storage device, and a reading device. The storage device stores therein a document digitization program. The reading device reads an image from an original document. The CPU executes the document digitization program to implement an image acquiring section, an added handwriting extracting section, and a document editing section. The image acquiring section acquires an image of a markup document using the reading device. The markup document is the original document modified by handwriting. The added handwriting extracting section extracts an added handwriting from the image of the markup document acquired by the image acquiring section. The document editing section edits a raw original document in accordance with a modification instruction given through the added handwriting extracted by the added handwriting extracting section to generate a digitized document. The raw original document is the original document without the modification. The raw original document includes one or more characters and one or more graphics. The document editing section alters a position of at least some of the characters and the graphics included in the raw original document to generate the digitized document.
A non-transitory computer-readable storage medium according to another aspect of the present disclosure stores thereon a document digitization program. The document digitization program causes an image forming apparatus to implement an image acquiring section, an added handwriting extracting section, and a document editing section. The image forming apparatus includes a reading device. The reading device reads an image from an original document. The image acquiring section acquires an image of a markup document using the reading device. The markup document is the original document modified by handwriting. The added handwriting extracting section extracts an added handwriting from the image of the markup document acquired by the image acquiring section. The document editing section edits a raw original document in accordance with a modification instruction given through the added handwriting extracted by the added handwriting extracting section to generate a digitized document. The raw original document is the original document without the modification. The raw original document includes one or more characters and one or more graphics. The document editing section alters a position of at least some of the characters and the graphics included in the raw original document to generate the digitized document.
A method for digitizing a document according to another aspect of the present disclosure is implemented by an image forming apparatus including a reading device. The reading device reads an image from an original document. The method for digitizing a document includes: acquiring an image of a markup document using the reading device, the markup document being the original document modified by handwriting; extracting an added handwriting from the image of the markup document acquired in the acquiring; and altering a position of at least some of one or more characters and one or more graphics included in a raw original document in accordance with a modification instruction given through the extracted added handwriting to generate a digitized document, the raw original document being the original document without the modification.
The following describes an embodiment of the present disclosure with the use of the drawings.
First, a configuration of a multifunction peripheral (MFP) 10 serving as an image forming apparatus according to the present embodiment will be described.
As illustrated in
The storage section 17 stores therein a document digitization program 17a. The document digitization program 17a digitizes a document based on an original document modified by handwriting (hereinafter, referred to as “a markup document”) The document digitization program 17a may be installed on the MFP 10 during production of the MFP 10, or may be additionally installed on the MFP 10 from a storage medium such as an SD card and a universal serial bus (USB) memory device, or may be additionally installed on the MFP 10 from a network.
The storage section 17 can store therein specific layout information 17b indicating a specific layout. The specific layout is for example a header layout, a footer layout, and/or a column layout for text. The storage section 17 may store the specific layout information 17b for each of users of the MFP 10 or for each of groups to which users of the MFP 10 belong. The MFP 10 can learn a possible original document in advance and thereby generate the specific layout information 17b. For example, in a case where a frequency at which a user lays out original documents as two columns is greater than or equal to a specific frequency, the MFP 10 includes, in the specific layout information 17b of the user, a layout that shows the text in two columns.
The storage section 17 can store therein character attribute information 17c. The character attribute information 17c refers to character attributes such as size, font type, font weight, and distance between characters. The character attribute information 17c may refer to character attributes depending on the location of the characters, such as header, footer, and text body. The storage section 17 may store the character attribute information 17c for each of users of the MFP 10 or for each of groups to which users of the MFP 10 belong. The MFP 10 can learn a possible original document in advance and thereby generate the character attribute information 17c.
The controller 18 for example includes a central processing unit (CPU), read only memory (ROM), and random access memory (RAM). The ROM stores thereon a program and various types of data. The RAM is used as a work area of the CPU of the controller 18. The CPU of the controller 18 executes the program stored in the ROM of the controller 18 or the storage section 17.
The controller 18 implements an image acquiring section 18a, an added handwriting extracting section 18b, a raw original document reproduction section 18c, an area extracting section 18d, a layout plan determination section 18e, and a document editing section 18f by executing the document digitization program 17a stored in the storage section 17. The image acquiring section 18a acquires an image of the markup document, which is the original document modified by handwriting, using a scanner 13. The added handwriting extracting section 18b extracts handwritten modification instructions, which in other words is added handwritings, from the image of the markup document acquired by the image acquiring section 18a. The raw original document reproduction section 18c reproduces an original document without the modification by handwriting, which in other words is a raw original document, from the image of the markup document. The area extracting section 18d extracts from the raw original document each of character areas and graphic areas in the raw original document. The layout plan determination section 18e determines a layout plan of the raw original document based on the areas extracted by the area extracting section 18d. The document editing section 18f edits the raw original document of the markup document in accordance with the modification instructions given through the added handwritings extracted by the added handwriting extracting section 18b to generate a digitized document.
The following describes operation of the MFP 10 for digitizing a document based on a markup document.
When an instruction instructing digitization of a document based on a markup document is input via the operation section 11, the controller 18 performs a process illustrated in
As illustrated in
The image 20 illustrated in
The instruction 31 is an instruction to add characters “1/2” to the right end of a header.
The instruction 32 is an instruction to add characters “of” between characters “Structure” and characters “Document”. The instruction 32 includes a symbol 32a for instructing a character insertion.
The instruction 33 is an instruction to delete three characters “bbb”. The instruction 33 is made of a symbol 33a for instructing a character deletion.
The instruction 34 is an instruction to swap a line that reads “ccc” with a line that reads “ddddd”. The instruction 34 is made of a symbol 34a for instructing a line swap.
The instruction 35 is an instruction to add characters “ttttt” between characters “fff” and characters “fffff”. The instruction 35 includes a symbol 35a for instructing a character insertion.
The instruction 36 is an instruction to delete a graphic. The instruction 36 is made of a symbol 36a for instructing a graphic deletion.
The instruction 37 is an instruction to move a graphic. The instruction 37 is made of a symbol 37a for instructing a graphic move.
The instruction 38 is an instruction to delete characters “
As illustrated in
As illustrated in
As illustrated in
The image 40 illustrated in
As illustrated in
When determining in S105 that the image 40 has character areas therein, the layout plan determination section 18e performs optical character recognition (OCR) on each of the character areas thereby to recognize characters in the character area (S106).
When determining in S105 that the image 40 has no character areas or when the step S106 is complete, the layout plan determination section 18e generates original document layout information (S107). The original document layout information indicates placement of each of the character areas and the graphic areas, which are extracted in S104, in the original document layout.
For example, the layout plan determination section 18e determines, with respect to each of the character areas and the graphic areas, a start position (a left end), a center position, and an end position (a right end) in a left-right direction of the image 40 of the raw original document as well as a start position (an upper end) and an end position (a lower end) in a top-bottom direction of the image 40 of the raw original document. In a case where some of the thus determined positions of an area and some of the thus determined positions of another area in the image 40 of the raw original document coincide, the layout plan determination section 18e determines, as the layout plan of the image 40 of the raw original document, that such areas are placed in accordance with such coinciding positions in the layout. This is because it is likely that such positions are made coincide purposely.
The layout plan determination section 18e also determines distances between areas. In a case where a distance determined for areas is shorter than a specific distance, the layout plan determination section 18e determines, as the layout plan of the image 40 of the raw original document, that the distance between the areas is maintained in the layout. The specific distance is for example a distance equivalent to two lines of characters having a specific size.
The layout plan determination section 18e for example determines, as the layout plan of the image 40 of the raw original document, that the start positions of the areas 41 to 43 in the left-right direction are aligned as indicated by a line 51. For another example, the layout plan determination section 18e determines, as the layout plan of the image 40 of the raw original document, that the end positions of the areas 42 and 43 in the left-right direction are aligned as indicated by a line 52. For another example, the layout plan determination section 18e determines, as the layout plan of the image 40 of the raw original document, that the center positions of the areas 44 to 47 in the left-right direction are aligned as indicated by a line 53. For another example, the layout plan determination section 18e determines, as the layout plan of the image 40 of the raw original document, that all of the distances 54, 55, 56, 57, and 58 are maintained.
As illustrated in
As illustrated in
Next, based on the image 20 read in S101 and the image 30 of the added handwritings extracted in S102, the document editing section 18f divides the added handwritings according to the distances between the added handwritings and contents of the added handwritings (S132). For example, in
After the step S132, the document editing section 18f selects an unselected one of the added handwritings, which are divided in S132, as a target (S133).
Next, the document editing section 18f determines a type of the instruction of the currently-selected target handwriting (S134).
As illustrated in
Next, the document editing section 18f specifies a position to which the character from the currently-selected target handwriting is to be added (S136).
More specifically, in a case where a position to which the character from the currently-selected target handwriting is to be added is appointed in a character area included in the specific layout information 17b and the original document layout information, the document editing section 18f specifies the appointed position in S136.
In a case where a position to which the character from the currently-selected target handwriting is to be added is not particularly specified in a character area included in the specific layout information 17b and the original document layout information, the document editing section 18f specifies an appropriate position in the area based on the specific layout information 17b, the document layout information, and the position of the currently-selected target handwriting in the markup document in S136. For example, in a case where the start position of the currently-selected target handwriting is located close to the start positions of separate areas that are aligned in the left-right direction of the image being edited, the document editing section 18f puts the start position of the currently-selected target handwriting in alignment with the start positions of the separate areas. Although starting positions of areas in the left-right direction of the image being edited have been described above, the same is true of center positions and end positions of areas in the left-right direction of the image being edited, and start positions and end positions of areas in the top-bottom direction of the image being edited. The document editing section 18f may separate the area of the currently-selected target handwriting from an area adjacent thereto by the same distance as the distance between areas located close to the area of the currently-selected target handwriting. If no regularity is found for the currently-selected target handwriting in terms of the start position, the center position, and the end position of the area thereof in the left-right direction of the image being edited as well as the start position and end position in the top-bottom direction of the image being edited, the document editing section 18f may specify the position of the handwriting of the currently-selected target handwriting as the position to which the character from the currently-selected target handwriting is to be added. For example, for adding a new character area 48 to a space under the area 43, the document editing section 18f defines the start position and the end position of the area 48 in the left-right direction using the line 51 and the line 52, respectively, and positions the area 48 so that a distance 59 between the area 43 and the area 48 is equal to the distance 55 between the area 42 and the area 43 as illustrated in
After the step S136, the document editing section 18f specifies attributes of the character from the currently-selected target handwriting (S137). For example, in a case where the image 40 of the raw original document has an area to which the character from the currently-selected target handwriting is to be added, the document editing section 18f acquires attributes of characters located around the position in the area to which the character from the currently-selected target handwriting is to be added. The document editing section 18f then specifies the acquired attributes as the attributes of the character from the currently-selected target handwriting.
After the step S137, the document editing section 18f adds the character recognized in S135 to the position, which is specified in S136, in the image being edited with the attributes specified in S137 or with the attributes indicated by the character attribute information 17c (S138).
In a case where the position to which the character from the currently-selected target handwriting is to be added is located in the middle of an existing area, for example, the document editing section 18f adds the character from the currently-selected target handwriting to the position, and accordingly moves backward characters, among the characters in the existing area, that should follow the character from the currently-selected target handwriting by the number of added characters. The position located in the middle of an existing area is for example a position between characters in a line in a character area included in the specific layout information 17b and the original document layout information. In a case where a character is added to a paragraph in an area, and accordingly characters that should follow the added character are moved backward, the document editing section 18f maintains the paragraph after moving backward the characters. In such a case, the document editing section 18f determines a line indented in the area to be a starting line of the paragraph. Furthermore, the document editing section 18f determines a line that ends with some space, a line immediately before a starting line of a following paragraph, or a last line in the area to be an ending line of the paragraph. Furthermore, after adding the character from the currently-selected target handwriting, the document editing section 18f moves backward characters following the area including the added character as necessary by an increase in the size of the area as a result of the addition. In a case where a distance between separate areas located downward of the area including the added character is greater than a specific distance, however, a lower area of the separate areas is not moved backward until the distance between the separate areas becomes equal to the specific distance. The specific distance is for example a distance equivalent to two lines of characters having a specific size.
The document editing section 18f can recognize a “heading” line in a character area by character recognition in S106. More specifically, the document editing section 18f recognizes a specific style of for example “Chapter . . . ” and recognizes a change in character size as character recognition. In a case where paragraphs in the area that follow the “heading” are indented, therefore, the document editing section 18f can be prevented from falsely detecting that each of the lines in the area that follow the “heading” constitutes a paragraph. For example, in a case where the document editing section 18f does not recognize a line 61 in an area 60 as a “heading”, the document editing section 18f recognizes each of the following lines as a paragraph as illustrated in
When determining in S134 that the instruction is a “graphic addition”, the document editing section 18f specifies a position to which a handwritten graphic from the currently-selected target handwriting is to he added (S139).
More specifically, in S139, the document editing section 18f specifies a new layout of areas based on the specific layout information 17b, the original document layout information, and the position of the currently-selected target handwriting in the markup document. For example, if the start position of the currently-selected target handwriting is located close to the start positions of separate areas that are aligned in the left-right direction of the image being edited, the document editing section 18f brings the start position of the currently-selected target handwriting in alignment with the start positions of the separate areas. Although starting positions of areas in the left-right direction of the image being edited have been described above, the same is true of center positions and end positions of areas in the left-right direction of the image being edited, and start positions and end positions of areas in the top-bottom direction of the image being edited. The document editing section 18f may separate the area of the currently-selected target handwriting from an area adjacent thereto by the same distance as the distance between areas located close to the area of the currently-selected target handwriting. In a case where no regularity is found for the currently-selected target handwriting in terms of the start position, the center position, and the end position of the area thereof in the left-right direction of the image being edited as well as the start position and end position in the top-bottom direction of the image being edited, the document editing section 18f may specify the position of the handwriting of the currently-selected target handwriting as the position to which the handwritten graphic from the currently-selected target handwriting is to be added.
After the step S139, the document editing section 18f adds the handwritten graphic from the currently-selected target handwriting to the position, which is specified in S139, in the image being edited (S140).
After adding the handwritten graphic from the currently-selected target handwriting, the document editing section 18f for example moves downward characters and/or graphics that are located downward of an area including the added graphic as necessary by a size of the area including the added graphic.
When determining in S134 that the instruction is a “deletion” such as the instruction 33, 36, or 38, the document editing section 18f specifies a character or a graphic instructed to be deleted by the currently-selected target handwriting (S141).
Next, the document editing section 18f deletes the character or the graphic, which is specified in S141, from the image being edited (S142).
In a case where a character or a graphic in the middle of an area is to be deleted, for example, the document editing section 18f deletes the character or the graphic from the area, and accordingly moves forward characters and/or graphics that are located backward of the deleted character or graphic in the area by the extent of the deleted character or graphic. In a case where a character is deleted from a paragraph in an area, and characters and/or graphics following the deleted character are moved forward, the document editing section 18f maintains the paragraph after moving forward the characters and/or graphics. The document editing section 18f can recognize a “heading” line in a character area. In a case where paragraphs in the character area that follow the “heading” are indented, therefore, the document editing section 18f can be prevented from falsely detecting that each of the lines in the character area that follow the “heading” constitutes a paragraph. Furthermore, after deleting a character or a graphic specified in an area, the document editing section 18f moves forward areas that are located downward of the area including the deleted character or graphic as necessary by a decrease in the size of the area as a result of the deletion of the specified character or graphic.
When determining in S134 that the instruction is a “move” such as the instruction 34 or 37, the document editing section 18f specifies a character or a drawing instructed to be moved by the currently-selected target handwriting (S143).
Next, the document editing section 18f specifies a position of a move destination instructed by the currently-selected target handwriting (S144).
Next, the document editing section 18f moves the character or the graphic specified in S143 to the position, which is specified in S144, in the image being edited (S145).
In a case where the character or the graphic specified in S143 is moved to the position specified in S144, the document editing section 18f for example moves downward characters and/or graphics that are located downward of an area including the move destination as necessary by the extent of the character or the graphic specified in S143. In a case where a distance between the area including the move destination and an area that is located immediately downward of the area including the move destination is greater than a specific distance, however, the area that is located immediately downward of the area including the move destination is not moved downward until the distance between these areas becomes equal to the specific distance. The specific distance is for example a distance equivalent to two lines of characters having a specific size. In a case where the character or the graphic specified in S143 is deleted at the move destination, the document editing section 18f moves upward areas that are located downward of the area including the deleted character or graphic as necessary by the extent of the deleted character or graphic. In a case where a character is added to a paragraph in an area, and accordingly characters that should follow the added character are moved backward, the document editing section 18f maintains the paragraph after moving backward the characters. In a case where a character is deleted from a paragraph in an area, and characters and/or graphics following the deleted character are moved forward, the document editing section 18f maintains the paragraph after moving forward the characters and/or graphics. The document editing section 18f can recognize a “heading” line in a character area. In a case where paragraphs in the area that follow the “heading” are indented, therefore, the document editing section 18f can be prevented from falsely detecting that each of the lines in the area that follow the “heading” constitutes a paragraph.
After the step S138, S140, S142, or S145, the document editing section 18f determines whether or not the added handwritings divided in S132 include any added handwriting that has not been selected as a target yet (S146).
When determining in S146 that the added handwritings divided in S132 include an added handwriting that has not been selected as a target yet, the document editing section 18f updates the original document layout information (S147) and performs a step S133 illustrated in
When determining in S146 that the added handwritings divided in S132 include no more added handwriting that has not been selected as a target yet, the document editing section 18f ends the operation illustrated in
When digitizing a document based on the markup document illustrated in
The characters “of” are added to the area 41 in accordance with the instruction 32. The start position of the area 41 in the left-right direction, and the start position and the end position of the area 41 in the top-bottom direction are not changed.
The three characters “bbb” are deleted from the area 42 in accordance with the instruction 33. The line including the characters “ccc” and the line including the characters “ddddd” are swapped in the area 42 in accordance with the instruction 34. The start position and the end position of the area 42 in the left-right direction, and the start position of the area 42 in the top-bottom direction are not changed. The area 42 is reduced by one line, and accordingly the end position of the area 42 in the top-bottom direction is moved upward by one line.
The characters “ttttt” are added to the area 43 in accordance with the instruction 35. The start position and the end position of the area 43 in the left-right direction, and the end position of the area 43 in the top-bottom direction are not changed. As a result of the area 42 being reduced by one line, the start position of the area 43 in the top-bottom direction is moved upward by one line.
The area 45 is deleted in accordance with the instruction 38.
The area 46 is deleted in accordance with the instruction 36.
The area 47 is moved in accordance with the instruction 37. The center position of the area 47 in the left-right direction is not changed. A distance 70 between the end position of the area 47 in the top-bottom direction and the start position of the area 44 in the top-bottom direction is equal to the distance 56 between the area 44 and the area 46 in the image 40 of the raw original document.
The characters “1/2” are added to the header in the area 49 in accordance with the instruction 31. The document editing section 18f sets the layout within the header in accordance with the specific layout information 17b.
As described above, the MFP 10 generates the digitized document by altering the position of at least some of the characters and the graphics included in the raw original document of the markup document. Thus, the adequacy of the layout of the digitized document based on the markup document can be improved.
The MFP 10 generates the digitized document by editing the raw original document in accordance with the layout plan of the raw original document. Thus, the adequacy of the layout of the digitized document based on the markup document can be improved.
When performing at least one of a character addition or a character deletion on a paragraph of the raw original document, the MFP 10 maintains the paragraph after the editing of the raw original document. Thus, the adequacy of the layout of the digitized document based on the markup document can be further improved.
The MFP 10 can reproduce the raw original document from the markup document even if the raw original document itself is not available. Thus, usability can be improved. Alternatively, the MFP 10 may store the image of the raw original document in the storage section 17 and use the image of the raw original document stored in the storage section 17 without reproducing the raw original document from the markup document.
Some steps of the document digitizing method according to the present disclosure may for example be implemented by a computer such as a personal computer (PC) instead of the MFP 10.
Although the present embodiment has been described using an example in which the image forming apparatus of the present disclosure is an MFP, the image forming apparatus may be any image forming apparatuses other than MFPs.
Number | Date | Country | Kind |
---|---|---|---|
2016-149068 | Jul 2016 | JP | national |