This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2022-007061 filed Jan. 20, 2022.
The present disclosure relates to an information processing apparatus, a non-transitory computer readable medium, and an information processing method.
For example, Japanese Unexamined Patent Application Publication No. 2018-124810 discloses an image forming apparatus including the following: an obtaining unit that obtains manuscript image data; a communication interface for communicating with an external apparatus that performs first optical character recognition processing on the manuscript image data; an optical character recognition processor that performs second optical character recognition processing, which is simpler processing than the first optical character recognition processing; and a controller that determines whether to execute the first optical character recognition processing on the basis of a result of recognition by the second optical character recognition processing, and generates a document file using at least one of a result of the first optical character recognition processing or a result of the second optical character recognition processing in accordance with the result of the determination.
Here, in the case where a user selects an apparatus for performing optical character recognition processing from among a plurality of apparatuses, it is difficult to select the apparatus according to the situation, and, if the number of apparatuses increases, it is assumed that the user's burden in selecting the apparatus increases.
Aspects of non-limiting embodiments of the present disclosure relate to reducing, as compared to the case where a user selects an apparatus that performs optical character recognition processing from among a plurality of apparatuses, the user's burden in selecting the apparatus.
Aspects of certain non-limiting embodiments of the present disclosure address the above advantages and/or other advantages not described above. However, aspects of the non-limiting embodiments are not required to address the advantages described above, and aspects of the non-limiting embodiments of the present disclosure may not address advantages described above.
According to an aspect of the present disclosure, there is provided an information processing apparatus including a processor configured to: obtain image data; obtain information including at least one of setting information set in advance for optical character recognition processing by a plurality of apparatuses capable of communicating with the information processing apparatus or attribute information of each of the plurality of apparatuses; and based on the obtained image data and the obtained information, determine an apparatus used for optical character recognition processing of the image data from among the plurality of apparatuses.
Exemplary embodiments of the present disclosure will be described in detail based on the following figures, wherein:
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
The information processing system 100 illustrated in
As a network for connecting the image forming apparatus 10 and the server apparatuses 20 to 40, for example, a local area network (LAN) or the Internet is used. Needless to say, the network may be configured as a composite type including a LAN and the Internet.
In addition to a function of printing an image on paper, the image forming apparatus 10 also includes a scanning function of optically reading an image of a manuscript or the like, and an optical character reader (OCR) function of optically recognizing the read image as characters. The image forming apparatus 10 is also referred to as a multifunctional peripheral (MFP). In addition, the image forming apparatus 10 may be a so-called production printer used for professional printing. Note that the functions listed for the image forming apparatus 10 are only exemplary, and do not prevent other functions from being provided.
For the printing function of the image forming apparatus 10, besides using an electrophotographic method in which a toner adhered to a charged and exposed photosensitive body is transferred to a recording material to fix and form an image, for example, an inkjet method in which ink is ejected onto a recording material to form an image may be used.
The image forming apparatus 10 includes an operation display unit (for example, see a user interface (UI) 60 illustrated in
Note that the image forming apparatus 10 may be replaced with an information processing apparatus such as a personal computer (PC) or a mobile information terminal such as a smartphone (none of them are illustrated). The image forming apparatus 10 is also an example of an information obtaining apparatus.
The server apparatuses 20 to 40 are configured as shared servers that provide so-called cloud services, and are located in a cloud environment operated at facilities owned by external business operators. More specifically, each of the server apparatuses 20 to 40 is equipped with the above-mentioned OCR function.
Accordingly, the image forming apparatus 10 and the server apparatuses 20, 30, and 40 each have an OCR function; while the OCR function of the image forming apparatus 10 may be referred to as a “built-in OCR”, the OCR function of each of the server apparatuses 20, 30, and 40 may be referred to as a “cloud OCR”. In the case where a cloud OCR is a paid service, for example, a usage amount per page may be set, or a fixed fee may be set for a predetermined number of pages, and, if processed pages exceed the predetermined number of pages, an additional fee may be charged.
Note that each of the server apparatuses 20 to 40 may physically be one computer, or may be realized by distributed processing performed by a plurality of computers. Moreover, each of the server apparatuses 20 to 40 in the present exemplary embodiment is configured as a shared server that provides so-called cloud services.
Here, a built-in OCR and a cloud OCR may have different features, such as performance including processing speed and accuracy, and processing cost. For example, a built-in OCR is characterized in that it has high processing speed but low accuracy, whereas a cloud OCR is characterized in that it is capable of analyzing columnated text at high cost, and it is also capable of analyzing non-columnated text with high accuracy and at low cost.
For this reason, the user needs to determine the processing request destination after grasping the features of each OCR. In particular, when there are multiple cloud OCRs available, it is difficult to select a cloud OCR that matches a document subjected to OCR processing. The more cloud OCRs available, the more cloud OCR choices the user has but the greater the burden of selection for the user, which may be less user-friendly.
Therefore, in the present exemplary embodiment, on receipt of an instruction for performing OCR processing, the image forming apparatus 10 selects whether to perform processing using a built-in OCR or a cloud OCR on the basis of document data, presetting, etc., and, in the case of performing processing using a cloud OCR, the user's burden in selecting a cloud OCR from among a plurality of cloud OCRs that are available is reduced.
Hereinafter, this will be specifically described.
As illustrated in
The image data obtaining unit 11 obtains image data as a target to be processed. Such data may be obtained using, besides the scanning function of the image forming apparatus 10, transmission of data from the outside.
In addition, the image data obtaining unit 11 obtains information indicating processing of image data. The information indicating processing mentioned here is presetting done by the user, and is information that specifies the contents of processing. For example, the information may be information indicating that OCR processing is to be performed, or may be information indicating that, after the OCR processing, translation into another language is to be performed. In addition, the information may be information that specifies whether the OCR processing is performed with priority on speed or reproducibility.
The information obtaining unit 12 obtains setting information (see example of setting information 90 illustrated in
Moreover, the information obtaining unit 12 provides information (see
In addition, the information obtaining unit 12 obtains attribute information (see an example of attribute information 50 illustrated in
The attribute information may be information on the notation aspect of characters or the language of characters in each of the server apparatuses 20 to 40. The information on the notation aspect of characters includes information indicating whether each server apparatus is capable of handling columns or handwritten characters. The information on the language of characters includes whether each server apparatus is capable of handling translation. Moreover, the information on the notation aspect of characters includes the direction of lines of the characters, that is, whether each server apparatus is capable of handling vertical writing, or whether each server apparatus is capable of handling ruby characters, which are furigana (Japanese reading aids).
Besides obtaining the above-described setting information and attribute information, the information obtaining unit 12 may not obtain attribute information while obtaining setting information, or may not obtain setting information while obtaining attribute information. That is, the information obtaining unit 12 obtains at least one of setting information or attribute information. Information including the setting information and/or attribute information obtained by the information obtaining unit 12 may be simply referred to as “information”.
The document analysis unit 13 conducts a document analysis of the obtained image data using a result obtained by the OCR unit 16, which is a built-in OCR. As a result of the document analysis mentioned here, it is determined whether there are columns of text, whether there are handwritten characters, whether the characters are characters of a language other than Japanese, and so forth. In the case where there are columns of text, the number of columns may be identified. The clause “there are handwritten characters” mentioned here includes cases where all the characters are handwritten characters, and also includes cases where printed characters and handwritten characters are mixed.
In addition, in the case where the image data includes illustrations, the document analysis unit 13 may identify the number of illustration areas or identify the number of character areas.
Furthermore, the document analysis unit 13 may determine whether the writing is vertical or horizontal, whether ruby characters are included, and so forth.
The request unit setting unit 14 sets a unit for determining an apparatus used for OCR processing of the image data from among the server apparatuses 20 to 40. The request unit setting unit 14 sets the unit in response to user operation.
The unit mentioned here is a predetermined unit determined in advance for an image, such as being all of the image data or a part of the image data. In the case where the unit is a part of the image data, the unit may be a unit of one page, or a partial unit on one page of the image data.
The unit mentioned here refers to a unit in the case where some or all of the server apparatuses 20 to 40 are requested to perform OCR processing of image data obtained by the image data obtaining unit 11. More specifically, besides the mode of requesting any one of the server apparatuses 20 to 40 to perform OCR processing of all of the image data, there are the following modes: the mode in which, when some of the server apparatuses 20 to 40 are requested to perform OCR processing, one page or plural pages serve as a unit; and the mode in which, when one page is divided into three parts, one or two parts serve as a unit.
The request destination determination unit 15 determines a request destination(s) from among the server apparatuses 20 to 40 on the basis of information obtained by the information obtaining unit 12 and the result of analyzing image data by the document analysis unit 13. In addition, using request unit setting information of the request unit setting unit 14, the request destination determination unit 15 may determine any one or multiple request destinations from among the server apparatuses 20 to 40.
The request destination determination unit 15 sends image data and necessary information to the determined request destination(s).
Although the request destination determination unit 15 determines a request destination(s) from among the server apparatuses 20 to 40, which are cloud OCRs, this is not the only possible case, and the request destination determination unit 15 may determine whether to use a cloud OCR or a built-in OCR.
The OCR unit 16 is a portion corresponding to the above-mentioned built-in OCR.
Note that the OCR unit 16 may generate OCR data, which serves as the basis for an analysis conducted by the above-described document analysis unit 13, or may perform OCR processing of image data obtained by the image data obtaining unit 11 together with or in place of the server apparatuses 20 to 40.
The processing data reception unit 17 receives an OCR-processed processing result or processing data from the server apparatus(es) 20 to 40 that has/have been requested to perform OCR processing.
The output document generation unit 18 generates an output document or an output document file corresponding to the image data on the basis of the processing data received by the processing data reception unit 17.
For the output document generated by the output document generation unit 18, the output document processor 19 performs processing such as printing of the output document locally or transferring the output document to another apparatus.
Here, each function of the image forming apparatus 10 is realized by a central processing unit (CPU) 10A, which is an example of a processor. The CPU 10A reads a program stored in read-only memory (ROM) 10B, sets random-access memory (RAM) 10C as a work area, and executes the program. The program executed by the CPU 10A may be provided to the image forming apparatus 10 by being stored in a computer-readable recording medium, such as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (such as an optical disk), a magneto-optical recording medium, or a semiconductor memory. In addition, the program executed by the CPU 10A may be downloaded to the image forming apparatus 10 using communication means such as the Internet.
Although each function of the image forming apparatus 10 is realized by software in the present exemplary embodiment, this is not the only possible case, and each function may be realized by, for example, an application specific integrated circuit (ASIC).
As illustrated in
The transmission/reception unit 21 performs transmission/reception to/from the image forming apparatus 10. That is, the transmission/reception unit 21 receives image data and necessary information from the request destination determination unit 15, and transmits processing data obtained by the processor 22 to the image forming apparatus 10.
The processor 22 is a portion that corresponds to the above-described cloud OCR, and performs OCR processing in response to a request from the image forming apparatus 10. The processor 22 may perform translation processing, for example, besides OCR processing.
Next, the obtaining of information by the information obtaining unit 12 of the image forming apparatus 10 will be described using
In the example of the setting information 90 illustrated in
The usable amount field 90a and the remaining-number-of-pages field 90b are examples of information on billing.
The usable amount field 90a is a field for setting a cost assumed by the user, where the user is able to enter in advance an acceptable upper limit value per page of OCR processing. Therefore, depending on the value in the usable amount field 90a, any of the server apparatuses 20 to 40 may be unavailable. Note that the usable amount field 90a is a field entered by the user in the case where a contract is concluded in a pay-as-you-go system in which the cost of OCR processing is determined according to the number of pages.
The remaining-number-of-pages field 90b is a field for setting a cost, like the usable amount field 90a, but, unlike the usable amount field 90a, the remaining-number-of-pages field 90b is a field entered by the user in the case where a contract is concluded in a fixed fee plan in which, while the fee is fixed until a predetermined number of pages, once the processed pages exceed that number of pages, a pay-as-you-go system is employed. Therefore, a user who wants to reduce the cost enters the number of pages determined in advance by the contract as the remaining number of pages, and, with the request destination determination unit 15 (see
In the case of the present exemplary embodiment, the usable amount field 90a and the remaining-number-of-pages field 90b are entered by the user according to the contract of each of the server apparatuses 20 to 40. Whereas the predetermined number of pages is entered in the remaining-number-of-pages field 90b, for example, the request destination determination unit 15 or the processing data reception unit 17 (see
In addition, a full-flat-rate contract where the fee is fixed regardless of the number of pages is also conceivable; in such a case, a full-flat-rate field is provided in place of the remaining-number-of-pages field 90b.
The processing speed field 90c and the reproducibility field 90d are fields for entering information used when selecting an apparatus that performs OCR processing, and the user is able to specify whether to place priority on processing speed or reproducibility in the case of performing OCR processing. In the setting information 90 illustrated in
Next, the attribute information 50 will be described.
The example of the attribute information 50 illustrated in
The attribute information 50 includes attribute information 51 for each item of the server apparatus 20, attribute information 52 for each item of the server apparatus 30, and attribute information 53 for each item of the server apparatus 40.
The index field 50a of the attribute information 50 is a field indicating a serial number given by the information obtaining unit 12, and “1” is given to the attribute information 51 of the server apparatus 20. “2” is given to the attribute information 52 of the server apparatus 30, and “3” is given to the attribute information 53 of the server apparatus 40.
The confidence level field 50b of the attribute information 50 is a field indicating a confidence level, which is an index indicating the performance of OCR processing. The confidence level mentioned here is set by the manufacturer to the apparatus performing OCR processing, which is a value representing the certainty of the character recognition result and which is a concept different from reading accuracy.
The higher the confidence level, the lower the proportion or frequency that the user makes corrections to the OCR processing result. The lower the confidence level, the higher the user's correction proportion or correction frequency. For example, the confidence level may be a proportion calculated on the basis of information corrected by the user on the recognition result.
In addition, the confidence level in the case of handwriting OCR that recognizes handwritten characters may be obtained by making the degree of similarity between an input image of handwritten characters and the recognition result as a rule using character recognition technology combined with the human visual mechanism.
In the example illustrated in
The usage amount field 50c is a field indicating a unit usage fee per page in the case of performing OCR processing, and is set according to the performance of OCR processing. The usage fee for OCR processing is the amount obtained by multiplying the unit usage fee by the number of pages.
In the example illustrated in
The column handling field 50d is a field indicating whether it is possible to perform OCR processing of columnated text. In the example illustrated in
Note that columns are used for preventing a decrease in readability due to an increase in the number of characters of one line, and two or three columns are set to have a layout where the characters are easy to read. In addition, ruled lines may be used as separations of columns.
The handwritten-character handling field 50e is a field indicating whether it is possible to perform OCR processing in the case where a to-be-processed target includes handwritten characters instead of printed characters. In the example illustrated in.
The translation handling field 50f is a field indicating whether it is possible to perform translation processing after OCR processing. In the example illustrated in
The column handling field 50d and the handwritten-character handling field 50e of the attribute information 50 are items of information on characters included in image data and are items of information on the notation aspect of the characters. The information on the notation aspect of the characters mentioned here is information indicating how the characters included in the image data are notated, and includes, for example, besides information indicating the presence or absence of columns, information indicating the number of columns when there are columns, and information indicating the presence or absence of handwritten characters. The information on characters included in image data mentioned here is information necessary for performing OCR processing of the characters included in the image data, and includes not only information on the notation aspect of the characters, but also information indicating whether the language of the characters is Japanese or a foreign language. In the case where the language of the characters is a foreign language, the information may include information necessary for translation processing, such as information indicating a specific language such as English.
The column handling field 50d and the handwritten-character handling field 50e are examples of information on characters included in image data, and are examples of information on the notation aspect of the characters. The translation handling field 50f of the attribute information 50 is an example of information on characters included in image data.
Here, a method of obtaining, by the information obtaining unit 12 (see
The UI 60 illustrated in
An exemplary screen of the UI 60 illustrated in
Indices 2 and 3 are selected from among indices 1 to 3. The state in which indices 2 and 3 are selected is indicated by broken-line frames.
After that, the user may press a “Next” button illustrated in
The UI 60 illustrated in
More specifically, “2” is already displayed in the index field of an input region 61 of the exemplary screen mentioned here, reflecting the selection result illustrated in
When the user finishes entering information into the screen illustrated in
Next, the case where the information obtaining unit 12 (see
In the exemplary process illustrated in
Then, the information obtaining unit 12 requests attribute information from the cloud OCR identified as having no attribute information (step S103). Upon obtaining of the attribute information from the identified cloud OCR, the information obtaining unit 12 saves the obtained attribute information (step S104).
Next, an exemplary process in the case where the image forming apparatus 10 obtains image data will be described using
In the first example illustrated in
The request destination determination unit 15 transmits all of the image data to the server apparatus 30 and requests OCR processing (step S12). In the server apparatus 30, the processor 22 (see
In the image forming apparatus 10, the processing data reception unit 17 (see
In addition, in the case of transmitting the image data, the request destination determination unit 15 may transmit all of the image data in bulk, or may transmit the image data in units of pages, as in the case of transmitting the image data of the first page and, on receipt of a processing result thereof, transmitting the image data of the second page.
In the second example illustrated in
The request destination determination unit 15 (see
Note that, even if the request unit is a part of the image data, it is also conceivable that any one of the server apparatuses 20 to 40 is determined as the request destination.
The server apparatuses 20 to 40 perform OCR processing of the received image data (steps S23-1, S23-2, step S23-3) and transmit the processing results to the image forming apparatus 10 (steps S24-1, S24-2, and S24-3).
On the basis of the received processing results, the image forming apparatus 10 generates and processes an output document or an output document file, like the first example.
Next, a first exemplary embodiment will be described using
In the first exemplary embodiment, as illustrated in
The user's presetting mentioned here may be included in information indicating processing of image data obtained by the image data obtaining unit 11 of the image forming apparatus 10, or may be information set in advance by the image forming apparatus 10.
Specifically, whether speed priority has been selected is determined by referring to the setting information 90 (see
If speed priority has not been selected (No in step S202), it is checked whether reproducibility priority has been selected (step S203). If reproducibility priority has been selected (Yes in step S203), the process proceeds to step S209 described below for processing using a cloud OCR.
In the case where reproducibility priority has not been selected (No in step S204), an analysis process using a built-in OCR is performed (step S204). Details will be described later with reference to
After the analysis process using a built-in OCR (step S204), the request destination determination unit 15 (see
In the case where it is determined to perform processing using a built-in OCR (Yes in step S205), processing using the OCR unit 16 (see
After that, it is determined whether the processing has been completed for all pages of image data (step S207), and, if it is not completed (No in step S207), the process returns to step S201; and, if it is completed (Yes in step S207), the output document generation unit 18 (see
In the first example, since the request unit is all of the image data, the request destination determined on the first page is also applied to subsequent pages. In contrast, in the case of the second example, since the request unit is a part of the image data, the request destination is determined for each page.
In the case of performing processing using a cloud OCR (Yes in step S203 or No in step s205), when a cloud OCR selection process is performed (step S209), the request destination determination unit 15 (see
When the processing is completed using the cloud OCR at the request destination and the result is transmitted, the processing data reception unit 17 receives the cloud processing result (step S211). When the cloud processing result is received, the process proceeds to step S207.
Next, an analysis process using a built-in OCR in step S204 described above illustrated in
In the analysis process using a built-in OCR illustrated in
That is, in accordance with the analysis result of the image data, it is determined whether the image data includes non-Japanese characters (step S302), whether the image data includes handwritten characters (step S303), whether the number of illustration areas is greater than or equal to a threshold N1 (step S304), whether the number of character regions is greater than or equal to a threshold N2 (step S305), whether the number of columns is greater than or equal to a threshold N3 (step S306), and whether the number of ruled lines is greater than or equal to a threshold N4 (step S307).
Note that these thresholds N1 to N4 are preset by the user. The thresholds N1 to N4 may be the user's presetting in the case where presetting is done for each item of the obtained image data, or may be the user's presetting in the case where, after the presetting is done, the presetting is uniformly applied to the obtained image data.
If none of the above determinations in steps S302 to S307 is applicable, the image data subjected to the determinations is regarded as data that is processable by a built-in OCR, and it is determined to perform processing using a built-in OCR (step S308).
In contrast, if of the above determinations in steps S302 to S307 is applicable, the image data subjected to the determinations is regarded as data that is not processable by a built-in OCR, and it is determined to perform processing using a cloud OCR (step S309).
After the above determination, the process returns to step S210 (see
Next, a cloud OCR selection process in step S209 described above illustrated in
In the cloud OCR selection process illustrated in
More specifically, it is determined whether there is any attribute information 50 in which the usage amount per page in the case where processing is performed on the to-be-processed image data is less than or equal to the value in its usage amount field 50c. In addition, in the case where there are columns in the to-be-processed image data, it is determined whether there is any attribute information 50 where “True” is included in its column handling field 50d; in the case where the processing target includes handwritten characters, it is determined whether there is any attribute information 50 where “True” is included in its handwritten-character handling field 50e; and in the case where the processing target includes non-Japanese characters, it is determined whether there is any attribute information 50 where “True” is included in its translation handling field 50f.
Then, the request destination determination unit 15 determines whether there are corresponding indices (step S402). If there are corresponding indices (Yes in step S402), the process selects a cloud OCR with the highest value of the confidence level in the confidence level field 50b (see
If there are no corresponding indices (No in step S402), it means that there is no cloud OCR capable of performing processing, and an error display is performed (step S404). Note that such an error display may be, for example, the contents “There is no cloud OCR capable of performing processing”. Moreover, in the case of an error display, it may be instructed to reconfigure the conditions of the setting information 90 to be more moderate (see
Next, a second exemplary embodiment will be described with reference to
The exemplary screen of the UI 60 in
The check boxes 71a to 73c indicate whether their corresponding items of image data 71 to 73 are selected as targets to be processed.
In the case of
Next, a third exemplary embodiment will be described with reference to
The exemplary screen of the UI 60 in
The range 81a is marked with circled one (hereinafter referred to as <1>) as number 82. In addition, the range 81b is marked with circled two (hereinafter referred to as <2>) as number 82, and the range 81c is marked with circled three (hereinafter referred to as <3>) as number 82.
These numbers 82 mentioned here are arranged vertically on the right side of the image data 81.
The exemplary screen illustrated in
When the user finishes operating the OK button 84 or entering a correction in the input field 85 for each of <1> to <3> of the image data 81, the user operates “Next” to allow the output document generation unit 18 (see
Note that <1> to <3> illustrated in
The other example illustrated in
Moreover, on the exemplary screen of the UI 60 illustrated in
When the user finishes checking the range 81a, the user may operate “Next” to check the remaining ranges 81b and 81c sequentially.
In the embodiments above, the term “processor” refers to hardware in a broad sense. Examples of the processor include general processors (e.g., CPU: Central Processing Unit) and dedicated processors (e.g., GPU: Graphics Processing Unit, ASIC: Application Specific Integrated Circuit, FPGA: Field Programmable Gate Array, and programmable logic device).
In the embodiments above, the term “processor” is broad enough to encompass one processor or plural processors in collaboration which are located physically apart from each other but may work cooperatively.
The order of operations of the processor is not limited to one described in the embodiments above, and may be changed. The foregoing description of the exemplary embodiments of the present disclosure has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, thereby enabling others skilled in the art to understand the disclosure for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the disclosure be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2022-007061 | Jan 2022 | JP | national |