The present disclosure relates to an image processing system, an image processing method, and a storage medium.
A conventional technique is known to detect an alteration in image data.
Japanese Patent Application Laid-Open No. 2009-200794 discusses an image processing apparatus for determining an alteration. The image processing apparatus divides image data into a plurality of groups each including pixels having close luminance values and performs character recognition processing for each group. If a result of performing the character recognition processing for each group and a result of performing the character recognition processing without grouping the image data are different from each other, the image processing apparatus determines that the image has been altered.
The technique discussed in Japanese Patent Application Laid-Open No. 2009-200794 determines the presence or absence of an alteration based only on the luminance. Thus, the technique may make false determination for an alteration due to luminance change of characters caused by ink blur or writing pressure, even though the characters have no alteration.
The present disclosure has been devised in view of the above-described issue and is directed to reducing the possibility of false determination for a character string alteration in image data.
The image processing system according to embodiments of the present disclosure includes a generation unit configured to generate a learning model by performing machine learning processing based on an altered image, an image before an alteration, and an image representing a difference between the altered image and the image before the alteration, an input unit configured to input image data, and an estimation unit configured to estimate whether the image data input by the input unit includes an altered image, by using the learning model generated by the generation unit.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Exemplary embodiments will be described in detail below with reference to the accompanying drawings. The following exemplary embodiments do not limit the present disclosure. Although a plurality of features is described in the exemplary embodiments, not all of the plurality of features is indispensable, and the plurality of features may be combined in an arbitrary way. In addition, in the accompanying drawings, identical or similar configurations are assigned the same reference numerals, and duplicated descriptions thereof will be omitted.
<<1. System Overview>>
The image processing apparatus 101 may be, for example, a Multi-Function Peripheral (MFP) having printing and image reading functions or a digital scanner dedicated for image reading. The image processing apparatus 101 includes a reading unit 111 and a display control unit 112. The reading unit 111 reads a document 11 to generate a read image. More specifically, the reading unit 111 acquires the read image of the document 11. The document 11 typically includes character strings, and thus the read image includes character images.
For example, the image processing apparatus 101 can support, in the learning stage, the generation of learning data. More specifically, an operator handwrites characters in a prepared blank learning document and sets the handwritten learning document to the image processing apparatus 101. The learning document may be, for example, a form document having write-in columns at one or more predetermined positions. The learning document may have visual identification information (e.g., a printed number, bar code, or two-dimensional code) for uniquely identifying each individual learning document. The image processing apparatus 101 may also be able to print a blank learning document. The reading unit 111 reads the set learning document to generate a read image 12. The read image 12 is handled as an image of the original of the learning document. The read image 12 is also referred to as an original learning image in the present specification. The operator or other person alters (e.g., writes strokes with a pen) the written learning document (i.e., original) and sets the altered learning document to the image processing apparatus 101. The reading unit 111 reads the altered learning document to generate a read image 13. In the present specification, the read image 13 is also referred to as an altered learning image. A plurality of pairs of the original learning image 12 and the altered learning image 13 is generated by repeating a sequence of handwriting characters in the learning document, reading the original (learning document) with the image processing apparatus 101, intentionally altering the learning document, and reading the altered learning document (an altered version) with the image processing apparatus 101. The image processing apparatus 101 transmits these pairs of the original learning image 12 and the altered learning image 13 to the learning apparatus 102 via the network 105. The learning apparatus 102 performs machine learning by using learning data generated from these pairs as described below Regardless of the above descriptions, the generation of the original learning image 12 and the altered learning image 13 may also be performed by an apparatus different from the image processing apparatus 101.
In the alteration detection stage, the image processing apparatus 101 reads the target document including handwritten characters to generate a read image 21. According to the present specification, the read image 21 is also referred to as a processing target image. The image processing apparatus 101 transmits the generated processing target image 21 to the alteration detection server 103 via the network 105. The display control unit 112 of the image processing apparatus 101 receives detection result data 32 from the alteration detection server 103. The result data 32 indicates the result of the alteration detection performed by using the processing target image 21. The display control unit 112 then controls the screen display of the alteration detection result based on the detection result data 32. Various examples of the display control will be specifically described below.
The learning apparatus 102 may be an information processing apparatus, such as a computer and workstation, performing supervised learning processing. The learning apparatus 102 includes a data processing unit 121, a learning unit 122, and a storage unit 123. The data processing unit 121 accumulates the above-described pairs of the original learning image 12 and the altered learning image 13 generated by the image processing apparatus 101 (or other apparatus) in the storage unit 123. The data processing unit 121 generates learning data based on the accumulated pairs. The learning unit 122 generates and/or updates a learned model (learning model) 41 for alteration detection through machine learning processing using the learning data generated based on the learning image (e.g., pairs of the original learning image 12 and the altered learning image 13) as a read image of the learning document. The learning unit 122 instructs the storage unit 123 to store the generated and/or updated learned model 41. For example, if a neural network model is used as a machine learning model, the learned model 41 is a data set including parameters, such as the weight and bias for each node of the neural network. Deep learning based on a multilayered neural network may be used as an example of a technique for machine learning for generating and/or updating the neural network model. Some examples of generation of learning data and generation and/or update of a learned model will be specifically described below The learning unit 122 provides the alteration detection server 103 with the learned model 41 in response to a request from the alteration detection server 103 described below.
The alteration detection server 103 may be an information processing apparatus, such as a computer and workstation. The alteration detection server 103 detects an altered portion included in the target document by using the processing target image 21 received from the image processing apparatus 101. The alteration detection server 103 includes an image acquisition unit 131 and a detection unit 132. The image acquisition unit 131 acquires the processing target image 21, which is a read image of the target document. The detection unit 132 detects an altered portion included in the target document using the processing target image 21. According to the present exemplary embodiment, the detection unit 132 uses the above-described learned model 41 offered from the learning apparatus 102 for alteration detection. In the present exemplary embodiment, the detection unit 132 estimates whether each of a plurality of pixels in the processing target image 21 belongs to altered portions by using the learned model 41 (e.g., alteration detection for each pixel). Alternatively, a modification in which the detection unit 132 determines whether each of one or more characters in the processing target image 21 include an altered portion (e.g., alteration detection for each character) will be described below. As a result of alteration detection, the detection unit 132 generates detection result data 32 indicating which portion of the processing target image 21 is determined to have been altered, and provides the image processing apparatus 101 with the generated detection result data 32. The detection result data 32 may include, for example, bitmap data indicating whether each pixel of the processing target image 21 belongs to the altered portion. The alteration detection result indicated by the detection result data 32 is presented to the user by the image processing apparatus 101, and the user validates the result. To support the validation by the user, the image processing apparatus 101 or the alteration detection server 103 generates an emphasized image that indicates to emphasize the pixels having being determined to belong to the altered portion the processing target image 21. In a case where the alteration detection server 103 generates the emphasized image, the alteration detection server 103 transmits the generated emphasized image, together with the above-described bitmap data, to the image processing apparatus 101.
According to the present exemplary embodiment, the processing target image 21 can be applied to the learned model 41 on a character basis. The alteration detection server 103 thus transmits the processing target image 21 to the OCR server 104 to request the OCR server 104 to recognize the characters included in the processing target image 21. The OCR server 104 may be an information processing apparatus, such as a computer and workstation. The OCR server 104 performs optical character recognition (OCR) in response to the request from the alteration detection server 103. The OCR server 104 includes a character recognition unit 141. The character recognition unit 141 performs OCR with respect to the processing target image 21 using a known technique, thereby recognizing the characters and character area positions in the processing target image 21. The character recognition unit 141 transmits the recognition result data 31 indicating the recognition result to the alteration detection server 103.
<<2. Apparatus Configuration>>
(I) Image Processing Apparatus
The image processing apparatus 101 includes a central processing unit (CPU) 201, a read only memory (ROM) 202, a random access memory (RAM) 204, a printer device 205, a scanner device 206, a conveyance device 207, a storage 208, an input device 209, a display device 210, and an external interface (I/F) 211. A data bus 203 is a communication line for mutually connecting these devices included in the image processing apparatus 101.
The CPU 201 is configured to control the entire image processing apparatus 101. The CPU 201 executes a boot program stored in the ROM 202 a nonvolatile memory to activate the operating system (OS) of the image processing apparatus 101. The CPU 201 executes a controller program stored in the storage 208 under the OS. The controller program is a program for controlling each of the devices of the image processing apparatus 101. The RAM 204 is used as the main memory device for the CPU 201. The RAM 204 provides the CPU 201 with a temporary storage area (i.e., a work area).
The printer device 205 is configured to print an image on paper (also referred to as a recording material or sheet). The printer device 205 may employ an electrophotographic method using photosensitive drums or a photosensitive belt, the inkjet method for discharging ink from a micronozzle array to directly print an image on paper, and any other printing methods. The scanner device 206 including an optical reading device, such as a Charge Coupled Device (CCD) for optically scanning a document, converts an electrical signal supplied from an optical reading device into image data of a read image. The conveyance device 207, which may be an Automatic Document Feeder (ADF), conveys documents set on the ADF one by one to the scanner device 206. The scanner device 206 may be able to read not only a document conveyed from the conveyance device 207 but also a document placed on the document positioning plate (not illustrated) of the image processing apparatus 101.
The storage 208 may be a writable/readable auxiliary storage device including a nonvolatile memory, such as a Hard Disk Drive (HDD) and a Solid State Drive (SDD). The storage 208 stores various types of data including the above-described controller programs, setting data, and image data. The input device 209, such as a touch panel and hardware keys, receives a user input, such as an operation instruction or information input from the user. The input device 209 transmits an input signal representing the contents of the received user input to the CPU 201. The display device 210, such as a Liquid Crystal Display (LCD) or a Cathode Ray Tube (CRT), displays an image (e.g., user interface image) generated by the CPU 201 on the screen. For example, the CPU 201 may determine what operation has been performed by the user, based on the pointing position and the allocation of the user interface. The pointing position is indicated by an input signal received from the input device 209. The allocation of the user interface is displayed by the display device 210. According to the determination result, the CPU 201 controls the operation of the corresponding device or changes the contents displayed by the display device 210.
The external interface (I/F) 211 transmits and receives various types of data including image data to/from an external apparatus via the network 105. The network 105 may be, for example, a Local Area Network (LAN), a telephone line, a proximity wireless (e.g., infrared) network, or any other types of networks. The external I/F 211 can receive Page Description Language (PDL) data describing drawing contents for printing from an external apparatus, such as the learning apparatus 102 and a personal computer (PC) (not illustrated). The CPU 201 interprets the PDL data received by the external I/F 211 to generate image data. The image data can be transmitted to the printer device 205 for printing or to the storage 208 for storage. The external 211 can transmit the image data of the read image acquired by the scanner device 206 to the alteration detection server 103 for alteration detection, and receive the detection result data 32 from the alteration detection server 103.
(2) Learning Apparatus
The learning apparatus 102 includes a CPU 231, a ROM 232, a RAM 234, a storage 235, an input device 236, a display device 237, an external I/F 238, and a Graphics Processing Unit (GPU) 239. The data bus 233 is a communication line for mutually connecting these devices included in the learning apparatus 102.
The CPU 231 is configured to control the entire learning apparatus 102. The CPU 231 executes a boot program stored in the ROM 232 a nonvolatile memory to activate the OS of the learning apparatus 102. The CPU 231 executes, on this OS, a learning data generation program and a learning program stored in the storage 235. The learning data generation program is a program for generating learning data based on a pair of the original learning image 12 and the altered learning image 13. The learning program is a program for generating and/or updating a learned model (e.g., a neural network model) for alteration detection through machine learning. The RAM 234 is used as the main memory device for the CPU 231 and provides the CPU 231 with a temporary storage area (e.g., a work area).
The storage 235 may be a writable/readable auxiliary storage device including a nonvolatile memory, such as an HDD and an SDD. The storage 235 stores various types of data including the above-described programs, learning image, learning data, and model data. The input device 236, such as a mouse and a keyboard, receives a user input, such as an operation instruction and information input from the user. The display device 237, such as an LCD and a CRT, displays an image generated by the CPU 231 on the screen. The external I/F 238 transmits and receives data related to the learning processing to/from an external apparatus via the network 105. The external I/F 238 can receive, for example, a pair of the original learning image 12 and the altered learning image 13 from the image processing apparatus 101. The external I/F 238 can transmit the learned model 41 generated and/or updated through machine learning to the alteration detection server 103. The GPU 239, a processor capable of performing advanced parallel processing, promotes the learning processing for generating and/or updating the learned model 41 in collaboration with the CPU 231.
(3) Alteration Detection Server
The alteration detection server 103 includes a CPU 261, a ROM 262, a RAM 264, a storage 265, an input device 266, a display apparatus 267, and an external I/F 268. The data bus 263 is a communication line for mutually connecting these devices included in the alteration detection server 103.
The CPU 261 is a controller for controlling the entire alteration detection server 103. The CPU 261 executes a boot program stored in the ROM 262 a nonvolatile memory to activate the OS of the alteration detection server 103. The CPU 261 executes, on this OS, an alteration detection program stored in the storage 265. The alteration detection program is a program for detecting an altered portion included in the target document by using the read image (i.e., processing target image) of the target document acquired from a client apparatus (e.g., the image processing apparatus 101). The RAM 264 is used as the main memory apparatus for the CPU 261. The RAM 264 provides the CPU 261 with a temporary memory (i.e., a work area).
The storage 265 may be a writable/readable auxiliary storage device including a nonvolatile memory, such as an HDD and an SDD. The storage 265 stores various types of data, such as the above-described programs, image data, and the detection result data 32. The input device 266, such as a mouse and a keyboard, receives a user input, such as an operation instruction and information input from the user. The display apparatus 267, such as an LCD and a CRT, displays an image generated by the CPU 261 on the screen. The external I/F 268 transmits and receives data related to alteration detection to/from an external apparatus via the network 105. The external I/F 268 can, for example, receive the processing target image 21 from the image processing apparatus 101 and transmit the detection result data 32 to the image processing apparatus 101. The external I/F 268 can transmit a request for offering the learned model 41 to the learning apparatus 102 and receive the learned model 41 from the learning apparatus 102. The external I/F 268 can transmit a request for performing the OCR to the OCR server 104 and receive the recognition result data 31 indicating the OCR result from the OCR server 104.
Although not illustrated in
<<3. Processing Flow>>
<3-1. Learning Stage>
In step S301, the operator sets, in the learning stage, a learning document filled with handwritten characters on the image processing apparatus 101 and instructs the image processing apparatus 101 to read the document. In this case, the operator inputs information indicating that the set learning document is the original, which is not altered, to the image processing apparatus 101 via the input device 209. In step S302, according to the operator's instruction, the reading unit 111 of the image processing apparatus 101 reads the set learning document to generate the read image 12. The reading unit 111 attaches a flag indicating that the read image 12 is the original learning image to the read image 12. In step S303, the operator sets the altered learning document to the image processing apparatus 101 and instructs the image processing apparatus 101 to read the document. In this case, the operator inputs information indicating that the set learning document includes the altered portion to the image processing apparatus 101 via the input device 209. In step S304, according to the operator's instruction, the reading unit 111 of the image processing apparatus 101 reads the set learning document to generate the read image 13. The reading unit 111 attaches a flag indicating that the read image 13 is an altered learning image, to the read image 13. In steps S302 and S304, the reading unit 111 reads identification information included in the learning document to recognize that the original learning image 12 and the altered learning image 13 are a pair of the original and an altered versions of the same learning document. The reading unit 111 associates a document identifier (ID) for identifying the recognized learning document with the original learning image 12 and the altered learning image 13. The reading unit 111 further associates a data set ID for identifying the unit of the learned model generation/update with the original learning image 12 and the altered learning image 13. As an example, when generating and/or updating one learned model for each image processing system, the data set ID may be an identifier for uniquely identifying the image processing apparatus 101. As another example, when generating and/or updating one learned model for each user, the data set ID may be an identifier for uniquely identifying each user. As yet another example, when generating and/or updating one learned model for each user group, the data set ID may be an identifier for uniquely identifying each user group. In step S305, the reading unit 111 transmits these learning images and related data (e.g., the original learning image 12, the altered learning image 13, the flag, the document ID, and the data set ID) generated in this way, to the learning apparatus 102.
In step S305 illustrated in
The learning image may include the original learning image 12 without the altered learning image 13 corresponding thereto. In this case, the data processing unit 121 can generate a partial image clipped from the character write-in column 402a of the original learning image 12 as an input learning image, and generate a binary image in which all pixels indicate false with the same size (i.e., indicating that the entire image includes no altered portion) as a teacher image.
The data processing unit 121 generates a plurality of the above-described input images and corresponding teacher images based on the plurality of pairs of the original learning image 12 and the altered learning image 13 related with the same data. set ID. In step S308, the learning unit 122 repetitively performs the learning processing using these input images and teacher images within the range of the same data set ID, thereby generating and/or updating the learned model 41 for alteration detection. The learned model 41 is not limited to a certain model but may be a fully convolutional network (FCN) model. For example, one repetition of the learning processing can include the following procedures: inputting an input image to the model, calculating an error for teacher data of output data calculated based on the model (having temporary parameter values), and adjusting the parameter values for reducing the error. For example, the cross entropy can be used as an index of the error. For example, the back-propagation method can be used as a technique for adjusting the parameter values. The learning unit 122 can repeat the learning processing until the convergence of learning is determined or until the number of repetitions reaches an upper limit. Then, the learning unit 122 stores the generated and/or updated learned model 41 (a set of model parameters configuring the learned model 41) in the storage unit 123 in association with the corresponding data set ID. The learning unit 122 may generate and/or update different learned models 41 for two or more different data set IDs. The learning unit 122 may update the previously generated and; or updated learned model 41 through additional learning processing using a newly acquired learning image. The learning unit 122 may also select the learning data to be input to the learning processing through either one of the on-line learning method, the batch learning method, and the mini-batch learning method.
<3-2. Alternation Detection Stage>
(1) Schematic Processing Flow
In the alteration detection stage, in step S351, the user sets the target document to the image processing apparatus 101 and instructs the image processing apparatus 101 to read the document. The user may be identical to or different from the operator involved in the learning stage. In step S352, the reading unit 111 of the image processing apparatus 101 reads the set target document to generate the read image 21 according to a user instruction. In step S353, the user instructs the image processing apparatus 101 to detect an alteration for the target document. In step S354, the reading unit 111 attaches a flag indicating that the read image 21 is the processing target image, to the read image 21, and acquires setting data related to the alteration detection (e.g., from a memory), according to an instruction for detecting an alteration. The acquired setting data can include, for example, the data set ID (e.g., an identifier for identifying the image processing apparatus 101 and the user or user group) for identifying the learned model to be used for the alteration detection. In step S355, the reading unit 111 transmits the processing target image 21 and related data, together with the alteration detection request, to the alteration detection server 103.
In step S355, the image acquisition unit 131 of the alteration detection server 103 receives the processing target image 21 as the read image of the target document and related data, together with the alteration detection request, from the image processing apparatus 101. The image acquisition unit 131 outputs the received image and data to the detection unit 132. In step S356, the detection unit 132 requests the learning apparatus 102 to oiler the latest learned model 41. The model request to be transmitted to the learning apparatus 102 can include the data set ID. Upon reception of the model request, then in step S357, the learning unit 122 of the learning apparatus 102 reads the latest learned model 41 from the storage unit 123 and transmits the read learned model 41 to the detection unit 132. The latest learned model 41 is identified, for example, by the data set ID. In step S358, the detection unit 132 transmits the processing target image 21 to the OCR server 104 to request the OCR server 104 to recognize characters included in the processing target image 21. Upon reception of the OCR request, then in step S359, the character recognition unit 141 of the OCR server 104 subjects the processing target image 21 to the OCR to recognize the characters and character area positions in the processing target image 21. In step S360, the character recognition unit 141 transmits the recognition result data 31 indicating the recognition result to the detection unit 132. In step S361, the detection unit 132 applies the processing target image 21 to the learned model 41 offered from the learning apparatus 102, thereby detecting an altered portion included in the target document. As described above, the learned model 41 is a model generated and/or updated through machine learning using the learning image as a read image of the learning document. In the alteration detection processing, for example, the detection unit 132 applies the partial image of the processing target image 21 to the processing target image 21 for each character area recognized as a result of the OCR The alteration detection result as a bitmap data indicating whether each pixel belongs to the altered portion is thereby generated for the character area for each character recognized in the processing target image 21. In step S362, the detection unit 132 transmits the detection result data 32 to the image processing apparatus 101. The detection result data 32 includes integrated bitmap data having the same size as the processing target image 21, which indicates whether each pixel belongs to the altered portion, generated by integrating the bitmap data obtained for each character area. In the following descriptions, this integrated bitmap data is referred to as a detection result image. The detection unit 132 may additionally generate an emphasized image that emphasizes the pixels (hereinafter referred to as altered pixels) determined to belong to the altered portion as a result of the alteration detection by the processing target image 21, and include the generated emphasized image in the detection result data 32.
In step S362 illustrated in
(2) Specific Processing Flow of Alternation Detection Stage (Image Processing Apparatus)
In step S801, the reading unit 111 reads a target document set on the conveyance device 207 by using the scanner device 206 to generate a processing target image. The processing target image may he, for example, a full color (3 channels of RGB) image. In step S802, the reading unit 111 receives via the input device 209 an instruction for detecting an alteration input by the user. In step S803, the reading unit 111 transmits the processing target image and related data (e.g., the data set ID), together with the alteration detection request, to the alteration detection server 103 via the external I/F 211. In step S804, the display control unit 112 waits for reception of the detection result data 32 from the alteration detection server 103. Upon reception of the detection result data 32 from the alteration detection server 103 via the external I/F 211 (YES in step S804), the processing proceeds to step S805. In step S805, the display control unit 112 determines whether the detection result data 32 indicates that the target document includes the altered portion. When the display control unit 112 determines that the target document includes the altered portion (YES in step S805), the processing proceeds to step S806. In contrast, when the display control unit 112 determines that the target document does not include the altered portion (NO in step S805), the processing proceeds to step S810. In step S806, the display control unit 112 determines, based on a user input, whether to contrastively display the emphasized and the comparative images or display only the emphasized image as an alteration detection result. If determined to contrastively display the emphasized and the comparative images (YES in step S806), the processing proceeds to step S807. If determined to display only the emphasized image (NO in step S806), the processing proceeds to step S808. In step S807, the display control unit 112 contrastively displays the emphasized image that emphasizes the altered portion in the processing target image and the comparative image that represents the altered portion as it is in the processing target image, on the screen of the display device 210. Examples of contrastive display will be further described below with reference to
(3) Specific Processing Flow in Alternation Detection Stage (Alternation Detection Server)
In step S901, the image acquisition unit 131 receives the processing target image and related data(e.g., the data set ID), together with the alteration detection request, from the image processing apparatus 101 via the external I/F 268. In step S902, the detection unit 132 transmits a request for offering a learned model to the learning apparatus 102 via the external I/F 268 and acquires the learned model from the learning apparatus 102. The detection unit 132 acquires the learned model identified by the data set ID received, for example, together with the alteration detection request. The detection unit 132 builds a neural network model, for example, on the RAM 264 and reflects values of the model parameters received from the learning apparatus 102 to the built model. In step S903, the detection unit 132 transmits a request for performing OCR for the processing target image, together with the processing target image, to the OCR server 104 via the external I/F 268 and receives recognition result data representing the OCR result from the OCR server 104. In step S904, the detection unit 132 clips one character area image out of characters recognized in the processing target image from the processing target image and applies the clipped character area image to the learned model acquired in step S902. The detection unit 132 thereby determines whether each of the plurality of pixels in the character area image belongs to the altered portion. The character area image may be subjected to the gray-scaling before being applied to the learned model. The result of the determination is bitmap data similar to the binary image 612 illustrated in
In the above-described example, the detection unit 132 clips a character area image from the processing target image based on the OCR result. However, the OCR does not necessarily need to be used. For example, if the target document is a form having a known format, the detection unit 132 can clip the image in the partial area square area) at a predetermined position from the processing target image as a character area image, according to a known format.
<<4. Details of Display Control>>
The detailed window 1150 illustrated in
When the user corrects the alteration detection result via the above-described correction window 1200, the display control unit 112 may update the learned model based on the alteration detection result after the correction. More specifically, the display control unit 112 transmits a pair of the processing target image and the detection result image after the correction, together with a model update request, to the teaming apparatus 102. Upon reception of the model update request from the display control unit 112, the learning unit 122 of the learning apparatus 102 can update the learned model by using the character area image for each character area in the processing target image as an input image, and the character area image for the identical character area in the detection result image as a teacher image. Thus, relearning is performed for patterns of pixels that are likely to be mis-detected by the current learned model, thereby effectively improving the accuracy of the alteration detection by the learned model.
The display control unit 112 may support only a single comparison mode or dynamically switch the comparison mode for contrastive display between candidates of a plurality of comparison modes. In the case of the single comparison mode, the contrastive display window 1350 does not need to include the Change Comparison Mode button 1354. In the case of switching the comparison mode for contrastive display, candidates of the comparison modes can include, for example, two or more of the following modes:
The display control unit 112 may display on the screen a list of candidates of these comparison modes to enable the user to specify a desired comparison mode. Alternatively, the setting of the comparison mode may be toggled (sequentially changed) in a predetermined order by every user operation of the Change Comparison Mode button 1354. The display control unit 112 can change the contents of the comparative image to be displayed in the comparative image display area 1362 in the contrastive display window 1350 according to the comparison mode specified by the user in this way.
In step S1601, the display control unit 112 determines which of the character-based contrastive display and the overall contrastive display is specified as the display mode for the contrastive display. For example, when the user operates the Contrastive Display button 1154 of the detailed window 1150 illustrated in
In step S1602, the display control unit 112 acquires character area data representing the positions and sizes of one or more character areas in the image. For example, the display control unit 112 may receive, as character area data, the recognition result data 31 indicating the result of the character recognition performed by the OCR server 104, together with the detection result data 32, from the alteration detection server 103. Alternatively, if the target document is a form having a known format, the display control unit 112 may acquire character area data, which includes predefined positions and sizes of the character areas, included in the known format from the storage 208. Subsequent steps S1603 and S1612 are repeated for each character area including the pixels determined to belong to the altered portion based on the detection result data 32, the character area being included in the character areas indicated by the character area data. Referring to the processing target image 21a and the detection result image 32a illustrated in
In step S1603, the display control unit 112 selects one of the character areas including the altered portion. Hereinafter, the selected character area is referred to as a selected area. In step S1604, the display control unit 112 clips a partial image of the selected area from the emphasized image according to the position and size indicated in the character area data. In step S1605, the display control unit 112 determines whether the currently set comparison mode is the comparison mode C2. If the comparison mode C2 is currently set (YES in step S1605), the processing proceeds to step S1606. If the comparison mode C1, C3 or C4 is currently set (NO in step S1605), the processing proceeds to step S1607. In step S1606, the display control unit 112 clips a partial image including the selected area and the peripheral area outside the selected area, from the read image. As an example, the size of the partial image including the peripheral area may be W times the size of the selected area in the horizontal direction and H times the size of the selected area in the vertical direction (the magnifications W and H are larger than a preset value of 1, e.g., W=4 and H=2). As another example, the peripheral area may be dynamically set as an area including N characters (N is a preset integer) in the vicinity of the selected area. In the comparison mode C2 (YES in step S1605), the partial image clipped in step S1606 serves as a comparative image. In the comparison mode C1, C3 or C4 (NO in step S1605), the processing proceeds to step S1607. In step S1607, the display control unit 112 clips a partial image of the selected area from the read image. In a case where the comparison mode C1 is currently set, the partial image clipped in step S1607 serves as a comparative image. In a case where the comparison mode C3 is currently set (YES in step S1608), the processing proceeds to step S1609. In step S1609, the display control unit 112 clips, from the read image, a partial image of other one or more character areas representing the identical character to the character represented in the selected area. The other character area is desirably an area not including the pixels determined to belong to the altered portion. The display control unit 112 may request the OCR server 104 to perform the OCR and identify another character area representing the identical character to the character represented in the selected area, based on the character recognition result returned from the OCR server 104. If such other character areas do not exist in the target document, step S1609 may be skipped. In the comparison mode C3, a combination of the partial images clipped in steps S1607 and S1609 (e.g., an image including two partial images arranged side by side) serves a comparative image. In a case where the comparison mode C4 is currently set (NO in step S1608, YES in step S1610), the processing proceeds to step S1611. In step S1611, the display control unit 112 suppresses the values of the pixels determined to belong to the altered portion the partial image clipped in step S1607. An example of the suppression is that the display control unit 112 corrects the pixel values to the same color as the background color, such as white, or a color close to the background color. In the comparison mode C4, the partial image processed in this way serves as a comparative image. In step S1612, the display control unit 112 determines whether there remains an unprocessed character area including the pixels determined to belong to the altered portion. If there remains such an unprocessed character area (YES in step S1612), the processing returns to step S1603. The display control unit 112 then repeats the above-described steps S1603 to S1611 for the next character area. If there remains no unprocessed character area (NO in step S1612), the processing proceeds to step S1613.
In step S1613, the display control unit 112 contrastively displays one or more pairs of the emphasized and the comparative images for each character area in the contrastive display window. Such character-based contrastive display enables the user to grasp which portion of each character composition is likely to be altered based on the emphasized image, and to determine whether each character has been altered by checking the tint or shading of the relevant portion in the comparative image.
In step S1620, the display control unit 112 contrastively displays the emphasized image and the comparative image (read image) representing the entire target document in the contrastive display window. As described above, the present exemplary embodiment enables smooth switching between the overall contrastive display representing the entire document and the character-based contrastive display described above. The user can thereby switch between the schematic grasp of which position's character in the document is likely to be altered in the overall display and the validation of each individual character in the character-based display, and thus the user can efficiently validate the alteration detection result through the target document.
In
<<5.Modifications>>
The exemplary embodiment has been described above centering mainly on a technique for applying the character area image in each character area to the learned model to determine whether each pixel in the character area belongs to an altered portion. However, the technique for alteration detection is not limited to the above-described example. For example, as a first modification, the alteration detection may be performed not on a pixel basis but on a character basis. The character-based alteration detection can utilize, for example, a learned model generated and/or updated by using a flag indicating whether each character area includes the altered portion, as teacher data, instead of using a teacher image (e.g., the binary image 612 illustrated in
In a second modification, the alteration detection may be performed by using an autoencoder type model (e.g., Variational Autoencoder (VAE)) for encoding a character area image to extract the feature quantity of the character, instead of using the determination type model. The learning processing of the VAE includes encoder processing for calculating a dimension decreased feature quantity based on the input image, and decoder processing for restoring the input image based on the calculated feature quantity. An encoder has a neural network structure, and a decoder has an inverse structure of the encoder. A model error is evaluated as the difference e.g., cross entropy) between the input and the restored images. The values of the model parameters are adjusted through, for example, the back-propagation method so that the error decreases. In this case, an alteration learning image and teacher data are not used. In the learning stage, the learning apparatus 102 generates and/or updates the learned model for extracting the feature quantity of an unaltered character from the character area image through the learning processing using a plurality of original learning images. Typically, it is not realistic to generate and/or update a single model from which a suitable feature quantity can be extracted for all characters. The learning apparatus 102 can therefore learn the values of different model parameters for each character type. For example, characters “1”, “2” and “3” may be handled as different character types. In the alteration detection stage, for example, the alteration detection server 103 applies each character area image to the encoder of the learned model corresponding to the character type recognized as a result of the OCR and extracts the feature quantity from the character area image. The alteration detection server 103 acquires a reference feature quantity pre-extracted for a known unaltered character area image having the same character type. If the difference between the two feature quantities satisfies a predetermined condition (e.g., if the Manhattan distance is equal to or less than a threshold value), the alteration detection server 103 can determine that the character area image does not include an altered portion. In contrast, if the Manhattan distance between the feature quantity extracted from the character area image and the reference feature quantity exceeds the above-described threshold value, the alteration detection server 103 can determine that the character area image includes an altered portion. According to the present modification, like the first modification, the alteration detection server 103 determines whether each of the characters recognized in the processing target image includes an altered portion (i.e., character-based alteration detection). The contrastive display of the emphasized and the comparative images according to the present modification can be performed similar to the example described above with reference to
The above-described exemplary embodiment centers mainly on an example where the emphasized and the comparative images are horizontally arranged in the contrastive display window. However, the emphasized and the comparative images may be arranged in any desired direction other than the horizontal direction. The emphasized and the comparative images may also be displayed in different windows. As a third modification, the emphasized and the comparative images may also be displayed at different timings instead of being spatially and contrastively arranged. For example, the emphasized image display for X seconds and the comparative image display for X seconds may be alternately and repetitively performed in a single image display area. The term “contrastive display” according to the present specification includes all of these display modes.
<<6. Summary>>
Exemplary embodiments of the present disclosure have been described in detail above with reference to
The above-described exemplary embodiment makes it possible to contrastively display the emphasized and the comparative images according to a comparison mode selected by the user from the two or more comparison modes having different contents of the comparative image. A first comparison mode enables the user to instantly grasp the correspondence between portions of character compositions in the emphasized and the comparative images. A second comparison mode enables the user to evaluate factors, such as a tint, shading, and stroke features, based on the characters before and after the character that is likely to be altered, thus validating the alteration detection result. A third comparison mode enables the user to refer to factors, such as a tint, shading, and stroke features, of the identical character (in another character area) to the character that is likely to be altered, thus validating the alteration detection result. A fourth comparison mode enables the user to view the contents of the document in a state where there are no strokes that may have possibly be added through the alteration, and to validate the alteration detection result by focusing only the pixels determined not to belong to the altered portion. Enabling the flexible switching between these comparison modes enables the user to efficiently perform the operation for validating the alteration detection result.
The above-described exemplary embodiment can provide the user with user interfaces that enable the user to correct the alteration detection result indicating which portion of the read image is determined to have been altered. When the user finds a detection error by monitoring the contrastive display of the emphasized and the comparative images, the above-described configuration enables the user to promptly correct the detection error. This makes it possible to smoothly transfer the alteration detection result suitably corrected by the user to man-powered or systematic processing following the validation of the presence or absence of alterations.
A certain exemplary embodiment makes it possible to determine whether each of a plurality of pixels in the read image belongs to an altered portion, and emphasize the pixels determined to belong to the altered portion in the emphasized image. In this case, the detailed alteration detection result can be visually presented to the user. For example, in a case of alteration in which strokes have been added to each individual character, only the strokes are emphasized. A certain modification makes it possible to determine whether each of one or more character areas in the read image includes an altered portion, and emphasize the characters in the character areas determined to include the altered portion in the emphasized image. In this case, since the load of calculation processing required for the alteration detection is low, the alteration detection result can be promptly presented to the user even if the read image has a large amount of data.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present disclosure includes exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2019-212299, filed Nov. 25, 2019, and Japanese Patent Application No. 2019-212526, filed Nov. 25, 2019, which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | Kind |
---|---|---|---|
2019-212299 | Nov 2019 | JP | national |
2019-212526 | Nov 2019 | JP | national |