The present disclosure relates to an image processing apparatus that performs handwritten character recognition using machine learning, a control method thereof, and a storage medium.
Japanese Patent Laid-Open No. 2022-61192 proposes a technique for analyzing image data by selecting a plurality of trained models having a hierarchical relationship from a plurality of trained models.
In handwritten character recognition, each person has their own idiosyncrasies when writing. Therefore, a word recognition rate or a character recognition rate can be further increased if individually trained models can be used in accordance with the idiosyncrasies of the handwritten characters and the features of the handwritten characters for each user instead of a common trained model.
The present disclosure enables realization of a mechanism for suitably selecting a trained model to be used for handwritten character recognition from a plurality of trained models.
One aspect of the present disclosure provides an image processing apparatus comprising: at least one memory device that stores a set of instructions; and at least one processor that executes the set of instructions to obtain image data generated by reading a sheet on which a handwritten character is written; select a trained model to be used for character recognition of the handwritten character on the sheet from a plurality of trained models; and execute character recognition by using the selected trained model.
Another aspect of the present disclosure provides a method of controlling an image processing apparatus, the method comprising: obtaining image data generated by reading a sheet on which a handwritten character is written; selecting a trained model to be used for character recognition of the handwritten character on the sheet from a plurality of trained models; and executing character recognition by using the selected trained model.
Still another aspect of the present disclosure provides a non-transitory computer-readable storage medium storing program for causing a computer to execute each step of a method for controlling an image processing apparatus, the method comprising: obtaining image data generated by reading a sheet on which a handwritten character is written; selecting a trained model to be used for character recognition of the handwritten character on the sheet from a plurality of trained models; and executing character recognition by using the selected trained model.
Further features of the present disclosure will be apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed disclosure. Multiple features are described in the embodiments, but limitation is not made to a disclosure that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings. First, the overall configuration of a system according to the present embodiment will be described with reference to
The system is configured to include an image processing apparatus 100, a training server 200, a general-purpose computer 130, and a data server 150. These devices are connected via a LAN 140 such as a wired LAN, and can transmit and receive data to and from each other. The image processing apparatus 100 is an apparatus having an image processing function such as a printer, a multifunction peripheral, a FAX, or a scanner. The image processing apparatus 100 may or may not include a reading unit such as a scanner. In a case where a reading unit is not provided, handwritten character recognition, which will be described later, is performed using image data read from a sheet or the like by another apparatus. In the present embodiment, a multifunction peripheral (MFP) is described using the example of an image processing apparatus.
The general-purpose computer 130 transmits print data to the image processing apparatus 100. The data server 150 collects training data used for machine learning in the training server 200 from an external device and provides the collected training data to the training server 200. The training server 200 generates a model using image data read from handwritten character sheets (documents) or the like externally provided as supervisory data, and provides the generated model to the image processing apparatus 100. Note that the types and numbers of these apparatuses are merely examples, and are not intended to limit the present disclosure. For example, multiple apparatuses may be integrally provided, or the disclosure may be realized by distributing functions across more apparatuses. More specifically, the image processing apparatus 100 may be configured to have at least one function of the training server 200 and the data server 150. Alternatively, the training server 200 may be configured to have at least one of a function other than the reading function of the image processing apparatus 100 and a function of the data server 150.
Next, an example of a configuration of the image processing apparatus 100 according to the present embodiment will be described with reference to
The CPU 101 controls overall operation of the image processing apparatus 100. The CPU 101 reads a control program stored in the ROM 102 or the storage 104, and performs various controls such as reading control and print control. The ROM 102 stores a control program that can be executed by the CPU 101, and stores a boot program that is executed at startup. The RAM 103 is a main storage memory of the CPU 101 and is used as a work area and as a temporary storage region for loading various control programs stored in the ROM 102 and the storage 104. The storage 104 stores print data, image data, various programs, various setting information, and the like. In addition, some processes may be executed by using hardware circuitry such as an Application Specific Integrated Circuit (ASIC) or a Field-Programmable Gate Array (FPGA).
The operation unit I/F 107 is connected to the operation unit 108 and receives user operations performed on the operation unit 108. The scanner controller I/F 114 is connected to the reading unit 115 and obtains the image data read from a scanned sheet. The reading unit 115 reads an image on the sheet to generate image data. The image data generated by the reading unit 115 is transmitted to an external device such as the training server 200, the data server 150, or the general-purpose computer 130, or is used to print an image onto a sheet. In addition, the read image data is used as an input for OCR. OCR is an abbreviation for Optical Character Recognition.
The printer controller I/F 112 is connected to the printing unit 113 and controls data exchange with the printing unit 113. Image data to be printed is transferred to the printing unit 113 via the printer controller I/F 112. The printing unit 113 receives a control command and image data to be printed, and prints an image based on the image data on a sheet. The printing method of the printing unit 113 may be an electrophotographic method or an inkjet method.
The communication unit I/F 105 is connected to the communication connector 106, and connects the image processing apparatus 100 to a network (not illustrated) via the communication connector 106 to control communication with an external device. The UI control unit 109 is connected to the display unit 110, which may be an LCD or the like, and performs display control of the display unit 110. An image processing unit 111 rotates, compresses, and converts the resolution of image data outputted to the printing unit 113 via the printer controller I/F 112. The external storage I/F 117 is connected to the external storage connector 118 and writes or reads data to or from an external storage such as a USB memory.
In the present embodiment, the storage 104 holds a plurality of models such as a trained model A 119, a trained model B 120, a trained model C 121, and a default trained model 124, which are models trained according to a machine learning method. The storage 104 also stores a model selection program 122 that selects a model to be used in the inference phase among the trained models. In the storage 104, information for logging into the image processing apparatus 100, such as login information 123, is stored. These trained models are obtained from an external device such as the training server 200, which can communicate via the communication connector 106, a network storage, or the like, or an external storage such as a USB memory that can be connected via the external storage connector 118.
Next, an example of a configuration of the training server 200 according to the present embodiment will be described with reference to
The training server 200 is what is generally referred to as a personal computer, and the CPU 201 provided therein controls the training server 200 overall. The CPU 201 directly reads and executes programs stored in the ROM 202 and the storage 204, or reads and executes programs after loading them into the RAM 203. The communication unit I/F 205, under control by the CPU 201, transmits data generated by the CPU 201 to, for example, the image processing apparatus 100, which is connected to a network (not illustrated), via the communication connector 206. The communication unit I/F 205 performs control for transmitting and receiving data to and from the image processing apparatus 100.
The display 208 is an output apparatus for performing display for a user, and is connected via the display I/F 207. The keyboard/mouse 210 is an input apparatus that accepts operations from a user and is connected via the input I/F 209. The user can operate the training server 200 using the keyboard/mouse 210 while confirming the display on the display 208. The external storage connector 212 connects an external storage such as a USB memory (not illustrated) and writes or reads data to or from a connected storage. The external storage connector 212 is connected via the external storage I/F 211.
The storage 204 stores a training program 213, and in the present embodiment, the training program 213 is activated by a user operation, and operated to output the trained model A 119, the trained model B 120, the trained model C 121, and the like. The outputted trained model A 119, trained model B 120, and trained model C 121 are stored in an external storage and installed in the image processing apparatus 100.
Next, an example of supervised data for training according to the present embodiment will be described with reference to
The training program 213 similarly includes a program that causes the training server 200 to read training data 304 in order to output the trained model B 120. Image data 305 is characters handwritten by Mr. B, and label data 306 is ground truth data of the image data 305. The training program 213 outputs the trained model B 120 from the training data 304. Note that this training data is merely an example and is not intended to limit the present disclosure, and other known training data may be used.
Next, an example of operation when the training server 200 executes the training program 213 will be described with reference to
When the CPU 201 executes the training program 213, the CPU 201 reads data and ground truth data thereof, and generates a trained model for outputting optimal inference results. The handwritten character image data 401 is a plurality of pieces of image data, and corresponds to the image data 302 and the image data 305 in
In a block 403, parameters of the trained model are adjusted, and a trained model 404 is outputted by learning the image data 401 and the label data 402 which is ground truth data thereof as inputs. For example, in a case where the input to the block 403 is the training data 301, the trained model A 119 is outputted, and in a case where the input is the training data 304, the trained model B 120 is outputted. Note that the learning method of the present disclosure is not intended to be limited to the above-described method, and any known learning method for outputting a trained model may be applied.
Hereinafter, an example of screens displayed on the display unit 110 of the image processing apparatus 100 will be described with reference to
The machine learning screen 503, which is an example of a selection menu, is configured to include buttons 504 to 508. The button 504 is a button for loading a trained model. The button 505 is a button for linking a logged-in user and a trained model. The button 506 is a button for selecting a trained model. The button 507 is a button for confirming a trained model. The button 508 is a button for linking a display language and a trained model.
When the button 504 is pressed, a transition is made to a screen 509 for loading the trained model of
In the user ID input field 701, a user ID for identifying the user is inputted via a physical keyboard, a virtual keyboard, or the like (not illustrated). In the password input field 702, a password linked with the user ID is inputted. When the login button 703 is pressed in a state in which the user ID and the password are inputted, the CPU 101 compares the inputted user ID and password with a user ID and password stored in advance in the storage 104 of the image processing apparatus 100. The CPU 101 then allows the user to log in if the result of the comparison is a match. The login method is not limited to the present embodiment, and any known login method may be applied.
Next, a data flow in an inference phase according to the present embodiment will be described with reference to
A block 802 indicates a functional module generated by executing the model selection program. When input data 801 is inputted to the block 802, the functional module links the logged-in user with the trained model to select which trained model to use from the plurality of trained models. In the example of
Next, a processing procedure of the image processing apparatus 100 that links the logged-in user and the trained model according to the present embodiment will be described with reference to
In step S1001, the CPU 101 waits until a predetermined user is logged into the image processing apparatus 100. Once the user has logged in, the CPU 101 proceeds to step S1002 and searches for a trained model linked with the logged-in user. Next, in step S1003, the CPU 101 determines whether a trained model linked with the user who logged in in step S1002 was found. If a trained model was found, the CPU 101 proceeds to step S1004; otherwise the CPU 101 proceeds to step S1005.
In step S1004, the CPU 101 selects a trained model linked with the logged-in user, sets the model to be used for recognizing handwritten characters, and proceeds to step S1006. On the other hand, in step S1005, if a trained model linked with the logged-in user is not found, the CPU 101 selects the default trained model 124, sets the default trained model 124 to be used for recognizing handwritten characters, and proceeds to step S1006.
In step S1006, the CPU 101 waits until the user performs a handwritten character recognition scan, and proceeds to step S1007 when a handwritten character recognition scan is performed. In step S1007, the CPU 101 causes the reading unit 115 to scan a sheet on which handwritten characters are written, stores the outputted scanned image in the storage 104, and executes the model selection program 122 using the stored scanned image as the input data 801.
Next, in step S1008, the CPU 101 executes the model selection program 122 to perform handwritten character recognition using the trained model selected from the plurality of trained models in step S1004 or step S1005. Subsequently, in step S1009, the CPU 101 waits for the output of the handwritten character recognition from the model executed in step S1008, and proceeds to step S1010 when the handwritten character recognition result is outputted. In step S1010, the CPU 101 stores handwritten character recognition output data 806 in the storage 104, and ends the processing of this flowchart.
As described above, the image processing apparatus according to the present embodiment reads a sheet on which handwritten characters are written, and obtains outputted image data. Further, the image processing apparatus selects a trained model to be used for character recognition of handwritten characters on the sheet from a plurality of trained models based on the information on the handwritten characters of the sheet, and performs character recognition using the selected trained model. Further, the image processing apparatus selects a trained model linked with a logged-in user of the image processing apparatus from a plurality of trained models. As described above, according to the present embodiment, a trained model linked with a logged-in user is selected from a plurality of trained models and used for recognition of handwritten characters. Thus, according to the present embodiment, it is possible to suitably select a trained model to be used for handwritten character recognition from a plurality of trained models.
Hereinafter, a second embodiment of the present disclosure will be described. In the present embodiment, a form in which a trained model corresponding to a user operation is selected from a plurality of trained models and used for recognition of handwritten characters will be described.
First, an example of a selection screen for selecting an arbitrary trained model in the present embodiment will be described with reference to
When the trained model selection button 506 is pressed on the machine learning screen 503 in
Next, a processing procedure of the image processing apparatus 100 for when an arbitrary trained model is selected by a user operation according to the present embodiment will be described with reference to
In step S1201, the CPU 101 determines whether the setting of the trained model has been changed by the user. Here, the CPU 101 waits until the setting is changed by the user, and proceeds to step S1202 when the setting is changed. In step S1202, the CPU 101 changes the setting to use the trained model set in the selection screen 1101. More specifically, the CPU 101 stores the setting change information in the storage 104. Here, for example, the CPU 101 may be realized by holding flag information or the like that can be identified as a trained model to be used, with a newly set trained model. When information indicating the change in setting of the selection of the trained model is stored in the storage 104, the processing proceeds to step S1203. The processes of step S1203 to step S1207 are the similar to the processes of step S1006 to step S1010 described with reference to
As described above, the image processing apparatus according to the present embodiment selects a trained model corresponding to a user operation from a plurality of trained models and uses the selected model for recognition of handwritten characters. Thus, according to the present embodiment, it is possible to suitably select a trained model to be used for handwritten character recognition from a plurality of trained models.
Hereinafter, a third embodiment of the present disclosure will be described. In the present embodiment, a form will be described in which a trained model linked with a designated language such as Japanese, English, or French is selected from a plurality of trained models and used for recognition of handwritten characters.
First, an example of a setting screen for performing a setting for linking a language setting and a trained model in the present embodiment will be described with reference to
When the button 508 for linking a display language with a trained model is pressed from the machine learning screen 503 of
Next, a processing procedure of the image processing apparatus 100 that links a language setting and a trained model according to the present embodiment will be described with reference to
In step S1401, the CPU 101 determines whether the display language of the image processing apparatus 100 has been changed by the user. Here, the CPU 101 waits until the setting is changed by the user, and proceeds to step S1402 when the setting is changed. In step S1402, the CPU 101 changes the setting to use the trained model corresponding to the display language designated in the setting screen 1301. More specifically, the CPU 101 stores the setting change information in the storage 104. The storage method may be any method, similarly to step S1202 described above. As the setting of the trained model, for example, a plurality of trained models corresponding to the designated display language may be selectably displayed, and a predetermined trained model may be selected from the displayed candidates in accordance with a user operation. Alternatively, a trained model corresponding to a predetermined user, such as a logged-in user, may be selected from a plurality of trained models corresponding to a designated display language. When information related to a change in the setting is stored in the storage 104, the processing proceeds to step S1403. The processes of step S1403 to step S1407 are similar to the processes of step S1006 to step S1010 described with reference to
As described above, the image processing apparatus according to the present embodiment selects a trained model linked with a target language such as Japanese, English, or French from a plurality of trained models and uses the selected trained model for recognition of handwritten characters. Thus, according to the present embodiment, it is possible to suitably select a trained model to be used for handwritten character recognition from a plurality of trained models.
Hereinafter, a fourth embodiment of the present disclosure will be described. In the present embodiment, a form will be described in which identification information of a user is obtained from a sheet to be read, and a trained model linked with the obtained identification information is selected from a plurality of trained models and used for recognition of handwritten characters.
First, an example of input data and output data according to the present embodiment will be described with reference to
Input data 1501 is obtained by converting into data an image obtained by using the reading unit 115 of the image processing apparatus 100 to read characters handwritten by a user on a sheet. A user name of the logged-in user or the like is written into a user designation region 1502 which is a designated region in the input data 1501. A user recognition module 1602, which will be described later, recognizes the user name (user identification information) written in the user designation region 1502, and notifies the model selection program 122 to use the trained model linked with the read user name. The model selection program 122 selects a trained model linked with a user name from a plurality of trained models. Here, although it is described that predetermined information is notified to the program, this means that the information is inputted as input data to a functional module realized by, for example, executing the model selection program 122. It is similar in the following description. Although an example in which the program is executed has been described here, the functional module may be implemented by hardware.
Output data 1503 indicates text data outputted from the selected trained models 119, 120, and 121 when the input data 1501 is inputted. In the present embodiment, the logged-in user name written in the user designation region 1502 is not converted into text in the output data 1503. That is, in the present embodiment, a recognition result for the handwritten characters written in regions other than the user designation region 1502 in the input data 1501 is outputted. Note that the present disclosure is not intended to be limited, and a recognition result for the user name written in the user designation region 1502 may be included in the output data.
Next, a data flow in an inference phase according to the present embodiment will be described with reference to
When the input data 1501 is inputted to the user recognition module 1602, the user recognition module 1602 recognizes the logged-in user name written in the user designation region 1502 of the input data 1501, and notifies the model selection program 122 of the logged-in user name. By executing the model selection program 122, a trained model corresponding to the logged-in user name received from the user recognition module 1602 is selected. In the present embodiment, the trained model A 119 is selected, and the selected trained model A 119 inputs the input data 1501 and outputs the output data 1503.
Next, a processing procedure of the image processing apparatus 100 that links a language setting and a trained model according to the present embodiment will be described with reference to
In step S1701, the CPU 101 determines whether or not the reading unit 115 has read a sheet on which handwritten characters have been written. When a read is completed, the CPU 101 proceeds to step S1702 and saves the input data 1501, which is the read-out image, into the storage 104. Subsequently, in step S1703, the CPU 101 performs optical character recognition on the user designation region 1502 part of the read image. Further, the CPU 101 identifies the logged-in user from the character recognition result in the user designation region 1502, and searches whether a trained model linked with the logged-in user is registered.
Next, in step S1704, the CPU 101 proceeds to step S1705 when a result of the search is that a trained model linked with the logged-in user is registered, and proceeds to step S1706 when such a trained model is not registered. In step S1706, the CPU 101 makes a setting to use the default trained model. Meanwhile, in step S1705, the CPU 101 makes a setting to use a trained model linked to the logged-in user.
Next, in step S1707, the CPU 101 performs handwritten character recognition using the trained model selected in step S1705 or step S1706 using data obtained by removing the user designation region 1502 from the input data 1501 as an input, and proceeds to step S1708. The processes of step S1708 and step S1709 are similar to the processes of step S1009 to step S1010 of
As described above, the image processing apparatus according to the present embodiment obtains identification information of a user from a sheet to be read, selects a trained model linked with the obtained identification information from a plurality of trained models, and uses the trained model for recognition of handwritten characters. Thus, according to the present embodiment, it is possible to suitably select a trained model to be used for handwritten character recognition from a plurality of trained models.
Hereinafter, a fifth embodiment of the present disclosure will be described. In the present embodiment, a form will be described in which a feature of a handwritten character that is the target of character recognition is extracted, and a trained model linked with the extracted feature is selected from a plurality of trained models and used for recognition of handwritten characters.
First, a data flow in an inference phase according to the present embodiment will be described with reference to
When the input data 801 is inputted to a machine learning model 1802 for selecting a trained model, the machine learning model 1802 reads a feature of a handwritten character written in the input data 801, and searches whether a user linked with that feature is registered. Here, as a method of extracting a feature of a handwritten character, a known extraction method is used, and for example, a size of a character, a center of gravity, an inclination, an aspect ratio, and the like may be extracted as a feature amount. Using an extracted parameter of the feature amount, the similarity can be obtained by comparing the feature amount with feature amounts of users who have already been registered, and a user with a high degree of similarity can be decided as the corresponding user. If the user corresponding to the feature is registered as a result of the search, the module that executed the model selection program 122 is notified of the user.
The module that executed the model selection program 122 selects a trained model corresponding to the user received from the machine learning model 1802 from a plurality of trained models. In the present embodiment, the trained model A 119 is selected, and the selected trained model A 119 receives the input data 801 and outputs the output data 806.
Next, a processing procedure of the image processing apparatus 100 that links an extracted feature and a trained model according to the present embodiment will be described with reference to
In step S1901, the CPU 101 determines whether or not the reading unit 115 has read a sheet on which handwritten characters have been written. When a read is completed, the CPU 101 proceeds to step S1902 and saves input data 1801, which is the read-out image, into the storage 104. Subsequently, in step S1903, the CPU 101 inputs the input data 801 into the machine learning model 1802 for selecting the trained model, obtains a feature amount of the handwritten characters, and searches for a similar feature amount of an already registered user. For example, if a user for which the degree of similarity is equal to or greater than a predetermined value is registered, the user is determined to be the user who wrote the handwritten characters. On the other hand, if only users for whom the degree of similarity is less than the predetermined value are registered, it is determined that no user with a similar feature is registered. If a user with a similar feature is registered as a result of the search in step S1904, the processing proceeds to step S1905, and if not, the processing proceeds to step S1906.
In step S1906, the CPU 101 makes a setting to use the default trained model, and the processing proceeds to step S1907. Meanwhile, in step S1905, the CPU 101 makes a setting to use a trained model linked to a registered user, and the processing proceeds to step S1907. Next, in step S1907, the CPU 101 inputs the input data 801 to the trained model selected in step S1905 or step S1906 to perform handwritten character recognition. Subsequently, in step S1908, the CPU 101 waits until a handwritten character recognition result is outputted by the trained model. When a result is outputted, the CPU 101 proceeds to step S1909, stores the outputted output data 806 in the storage 104, and ends the processing of this flowchart.
As described above, the image processing apparatus according to the present embodiment extracts a feature of a handwritten character that is a target of character recognition, selects a trained model linked with the extracted feature from a plurality of trained models, and uses the trained model for recognition of handwritten characters. Thus, according to the present embodiment, it is possible to suitably select a trained model to be used for handwritten character recognition from a plurality of trained models. In the above described embodiments, an example has been described in which the image processing apparatus 100 which includes the reading unit 115 reads an image of a document and performs character recognition. However, the present disclosure is not limited thereto, and a server capable of communicating with the image processing apparatus 100 may receive image data generated by the image processing apparatus 100 reading a document and performing character recognition on the received image data. At that time, a trained model may be selected from a plurality of trained models by a method of the above-described embodiments, and character recognition processing may be executed using the selected trained model. At this time, the screens of
Further, in the above embodiment, an example has been described in which a trained model corresponding to a feature having a high degree of similarity with an extracted feature amount is selected from a plurality of trained models. However, the present disclosure can be variously modified, and a degree of similarity extracted for each trained model may be displayed on a selection screen or the like to provide information to the user, and a trained model selected based on a subsequent user operation may be used for handwritten character recognition.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2022-126608, filed Aug. 8, 2022, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2022-126608 | Aug 2022 | JP | national |