This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2020-214342 filed Dec. 23, 2020.
The present disclosure relates to an information processing apparatus and a non-transitory computer readable medium.
Grouping techniques to group forms are available.
Japanese Unexamined Patent Application Publication No. 2006-209261 discloses a received form job display method. The disclosed received form job display method includes an receiving operation to receive a digital image, a recognition operation to recognize the type of a form from the received digital image, a determination operation to determine whether the recognized digital image is a pre-registered form or unrecognized form, a setting operation to preset a display method of the unrecognized form, and a display operation to display in a list the determined form in accordance with the set display method.
When forms are managed, form data obtained by reading the forms may be grouped. For example, the forms may be grouped on a per attribute basis of the forms identified from the form data in accordance with form definition.
However, the accuracy of recognition of the forms may not be high enough and correct recognition results may not necessarily be obtained. If the attribute of a form is erroneously identified or is not identified at all, the form data is difficult to normally group. To normally group the form data, the user may manually correct the groups of the forms and this job may be time consuming.
Aspects of non-limiting embodiments of the present disclosure relate to providing an information processing apparatus and non-transitory computer readable medium reducing, in the case in which form data is difficult to normally group, user workload in grouping the form data in comparison with when grouping of forms is manually corrected.
Aspects of certain non-limiting embodiments of the present disclosure overcome the above disadvantages and/or other disadvantages not described above. However, aspects of the non-limiting embodiments are not required to overcome the disadvantages described above, and aspects of the non-limiting embodiments of the present disclosure may not overcome any of the disadvantages described above.
According to an aspect of the present disclosure, there is provided an information processing apparatus including a processor configured to: with multiple pieces of form data having attributes of forms being grouped in accordance with form definition information that defines groups respectively with the attributes, receive a change of an attribute of an ungrouped piece of the form data; and re-group the form data in accordance with an attribute responsive to the change and the form definition information.
Exemplary embodiment of the present disclosure will be described in detail based on the following figures, wherein:
Exemplary embodiment is described with reference to the drawings. Like elements and operations are designated with like reference numerals throughout the drawings and the discussion thereof is not duplicated. Dimensions of the drawings are exaggerated for convenience of explanation and different from actual proportion thereof.
Referring to
The information processing apparatus 20 controls a series of operations including performing an OCR (optical character recognition) process on image data of a document including multiple forms input via the input device 60 and outputting the results of the OCR process to a predetermined output destination. According to the exemplary embodiment, the information processing apparatus 20 is a server computer. However, the information processing apparatus 20 may be a personal computer (PC) or a smart phone. Specific configuration and operations of the information processing apparatus 20 are described below.
The client terminal 40 transmits to the information processing apparatus 20 a variety of instructions related to the OCR process. The instructions include an instruction to start reading information on the image data and an instruction to display read results of the information on the image data. The client terminal 40 also displays a variety of information including the results of the OCR process that the information processing apparatus 20 has performed in response to the variety of instructions received, and a notification related to the OCR process. For example, the client terminal 40 may be a general-purpose computer, such as a server computer or a personal computer (PC). Although a single client terminal 40 is illustrated in
The input device 60 inputs to the information processing apparatus 20 the image data serving as a target of the OCR process. The input device 60 may be a general-purpose computer, such as a server computer or a PC, or an image forming apparatus having a scan function, printer function, and fax function. Not only the input device 60 but also the client terminal 40 may input the image data to the information processing apparatus 20.
The configuration of the form system 10 is described below.
In the form system 10, the information processing apparatus 20 performs the OCR process on the image data input via the input device 60 and outputs the results of the OCR process to a predetermined output destination.
In the OCR process, the information processing apparatus 20 controls a variety of operations including a business process design and operation verification operation (1), data input operation (2), data reading operation (3), form determination confirmation and correction operation (4), reading result confirmation and correction operation (5), business check operation (6), data output operation (7), and step-back operation (8). According to the exemplary embodiment, the OCR process includes not only a reading operation to read characters and symbols from the image data but also a post-operation operation, such as a correction operation to characters.
The information processing apparatus 20 automatically performs the business process design and operation verification operation (1), data input operation (2), data reading operation (3), business check operation (6), and data output operation (7) as examples of control operations. The form determination confirmation and correction operation (4) and reading result confirmation and correction operation (5) are received as examples of control operations when the user inputs them using the client terminal 40. The step-back operation (8) as an example of control operations may be automatically performed by the information processing apparatus 20 or accepted when the user inputs the step-back operation (8) via the client terminal 40.
In the business process design and operation verification operation (1), job rules are produced. The job rules include reading definition setting, output setting, and business check setting. For example, in the reading definition setting, a read range in which information on the image data is read is set in the data reading operation (3). Specifically, a definition of reading an item value as a value to the right of an item extracted as a key may be set. For example, in the output setting, a file format and an output destination of output data to be output in the data output operation (7) are set. In the business check setting, a format, such as an input item in a form serving as a detection target and the number of inputtable characters, in the business check operation (6) is set.
In the data input operation (2), the image data is received from the input device 60. The input image data is registered as a job that serves as an execution unit in the data reading operation (3).
In the data reading operation (3), the information on the image data in the job is read by using a job rule of the job to be executed. The job rule is selected by the user from among the job rules produced in the business process design and operation verification operation (1). For example, in the data reading operation (3), the determination of the form (hereinafter referred to as a “form determination”) is performed on the image data in the job and a character and symbol within the read range are read.
In the form determination confirmation and correction operation (4), the image data in the job is grouped into a record indicating a form in the job in accordance with results of the form determination performed in the data reading operation (3). In the form determination confirmation and correction operation (4), the client terminal 40 is caused to display the group and user confirmation and correction are accepted from the client terminal 40.
In the reading result confirmation and correction operation (5), the results of reading the character and symbol in the read range in the data reading operation (3) are displayed and user confirmation and correction to the read results are accepted.
In the business check operation (6), an error in a prior operation is detected in accordance with the business check setting included in the job rule of the job. The job rule is selected by the user from among the job rules produced in the business process design and operation verification operation (1). The detection results may be displayed to the user.
In the data output operation (7), output data is produced using the output setting included in the job rule of the job. The job rule is selected by the user from among the job rules produced in the business process design and operation verification operation (1). The produced output data is output to a predetermined output destination.
In the step-back operation (8), processing steps back from an operation performed in the OCR process by one or more operations. For example, using the client terminal 40, the user may issue an instruction to perform the step-back operation (8) in the middle of the form determination confirmation and correction operation (4) or the reading result confirmation and correction operation (5). Typically, an instruction of the step-back operation (8) may be issued by the client terminal 40 of an administrator in response to results of an administrator check performed between the business check operation (6) and the data output operation (7).
In the OCR process, the business process design and operation verification operation (1) is performed prior to the data reading operation (3), namely, prior to the start of the operation of the form system 10. Alternatively, the business process design and operation verification operation (1) may be performed in the middle of the operation of the form system 10 that performs the data reading operation (3) or subsequent operation. For example, a job rule produced in the business process design and operation verification operation (1) prior to the operation of the form system 10 may be appropriately corrected in response to the results of the reading result confirmation and correction operation (5) with the form system 10 operating.
The form determination confirmation and correction operation (4) of the information processing apparatus 20 is described in detail below.
The form data has an attribute of a form. The form data corresponds to the image data on which the form determination is performed in the data reading operation (3) in
The grouping refers to grouping the form data in accordance with the attribute and form definition information 100.
The attribute refers to a form class, attached file, exclusion flag, and/or serial number.
The form class refers to a combination of the type of a form and a page number of the form. For example, the type of form may be a document, such as transportation expense report or loan application. The page number of the form is a page number of a document including multiple sheets. If the type of the form is A and the page number is 1, the form class is designated “A1.” For example, “C2” represents a form class indicating that the form class is C and the page number is 2. In the following discussion, the form data of a form class A1 is designated form data A1. The form data having unknown form class is referred to as unknown form. In the drawings of the disclosure, the form data having unknown form class is simply referred to as “unknown.”
The attached data is data that is attached to a form. If the attribute is an attached file, the form data is the attached file. The grouping of the attached file is described with reference to
The exclusion flag is used to determine whether a target is to be grouped or not. In the exclusion flag, an on flag represents the state of 1 and an off flag represents the state of 0. The grouping of the form data with the exclusion flag on is described below with reference to
Each piece of the form data is consecutively numbered with a serial number. The numbering and change of the serial number are described below with reference to
The CPU 11 executes a variety of programs and controls each element in the information processing apparatus 20. Specifically, the CPU 11 reads a program from the ROM 12 or the storage 14 and executes the program using the RAM 13 as a working area. In accordance with the program stored on the ROM 12 or the storage 14, the CPU 11 controls the elements and performs a variety of arithmetic processes. According to the exemplary embodiment, an information processing program to execute operations related to the grouping is stored on the ROM 12 or the storage 14. The form definition information 100 is stored on the ROM 12 or the storage 14.
The ROM 12 stores the variety of programs and a variety of data. The RAM 13 temporarily stores the programs or data. The storage 14 includes a hard disk drive (HDD) or a solid-state drive (SSD) and stores a variety of programs including an operating system and a variety of data.
The input unit 15 includes a pointing device, such as a mouse, and a keyboard and is used to enter a variety of inputs. The display 16 is a liquid-crystal display and displays a variety of information. The display 16 may be of a touch panel and also function as the input unit 15.
The communication interface 17 is used to communicate with another apparatus, such as a database, and complies with standards for Ethernet (registered trademark), fiber-distributed data interface (FDDI), and/or Wi-Fi (registered trademark).
The process of the list screen 200 is described below.
In step S110, the CPU 11 acquires the form data. Specifically, the CPU 11 acquires as the form data the image data that has been determined in the data reading operation (3).
In step S120, the CPU 11 initializes an exclusion flag. The initialization means that the exclusion flag is set to off.
In step S130, the CPU 11 performs each piece of the form data in the order of arrangement with a serial number. For example, the CPU 11 numbers each piece of the form data in the order of acquisition in step S110 with the serial number.
In step S140, the CPU 11 performs the grouping process. The contents of the group processing are described with reference to
In step S150, the CPU 11 receives a change of an attribute. For example, the CPU 11 receives a change operation to the attribute by a user via the input unit 15. The change operation of the attribute by the user is described with reference to
In step S160, the CPU 11 cancels the grouping of a group to which the form data having received the change of the attribute therefor belongs.
In step S170, the CPU 11 performs the grouping process. Specifically, the CPU 11 re-groups the form data. The contents of the re-grouping are described with reference to
In step S180, the CPU 11 determines whether to end the process. If the CPU 11 determines that the process is to be ended (yes path in step S180), the CPU 11 ends the process. If the CPU 11 determines that the process is not to be ended (no path in step S180), the CPU 11 returns to step S150. For example, if the user clicks an OK button 208 in
In the discussion that follows, S, N, and P represent variables. S is the variable used to determine a start position of the grouping process. N is the variable used as an index in a loop process. P is the variable used as a start position of a grouping determination.
In the discussion that follows, the terms “form data [S]” and “form data [S−1]” are used. The form data [S] represents form data having a serial number S. For example, the form data [S] with S being 3 represents the form data having a serial number 3. The form data [S−1] represents the form data having a serial number being (S−1) (S subtracted by 1). If S is 3, the form data [S−1] is the form data having a serial number 2. The same true of N and P.
Via operations in steps S210 through S270 in
In step S210, the CPU 11 substitutes 1 for S.
In step S220, the CPU 11 determines whether form data [S] is present or not. If the form data [S] is present (yes path in step S220), the CPU 11 proceeds to step S230. If the form data [S] is not present (no path in step S220), the CPU 11 proceeds to step S260.
In step S230, the CPU 11 determines whether the form data [S] is grouped. If the form data [S] is determined to be not grouped (no path in step S230), the CPU 11 proceeds to step S240. If the form data [S] is determined to be grouped (yes path in step S230), the CPU 11 returns to step S270.
In step S240, the CPU 11 determines whether any group with the form data [S−1] belonging thereto is present. If any group with the form data [S−1] belonging thereto is present (yes path in step S240), the CPU 11 proceeds to step S250. If the CPU 11 determines that any group with the form data [S−1] belonging thereto is not present (no path in step S240), the CPU 11 proceeds to step S260.
In step S250, the CPU 11 substitutes for N the serial number of a leading piece of the form data of the group. The group is a group to which the form data [S−1] belongs. Specifically, the CPU 11 substitutes for N the serial number of the leading piece of the form data of the group to which the form data [S−1] belongs. This is specifically described with reference to
In step S260, the CPU 11 substitutes 1 for N.
In step S270, the CPU 11 substitutes (S+1) for S.
Via operations in steps S310 through S430 in
In step S305, the CPU 11 substitutes N for P.
In step S310, the CPU 11 starts an iteration process of the grouping loop. The CPU 11 determines whether to iterate the grouping loop in step S430.
In step S320, the CPU 11 determines whether the exclusion flag of form data [N] is off. If the exclusion flag of the form data [N] is determined to be off (yes path in step S320), the CPU 11 proceeds to step S340. If the exclusion flag of the form data [N] is determined to be on (no path in step S320), the CPU 11 proceeds to step S420.
In step S340, the CPU 11 determines whether the form data [N] satisfies the form definition information 100. If the form data [N] is determined to satisfy the form definition information 100 (yes path in step S340), the CPU 11 proceeds to step S350. If the form data [N] is determined to not meet the form definition information 100 (no path in step S340), the CPU 11 proceeds to step S410. The form data [N] not satisfying the form definition information 100 means that the form data [N] is an unknown form.
In step S350, the CPU 11 determines whether all the form data [P] through [N] with exclusion flags thereof being off satisfy the definition of the group. If all the form data [P] through [N] with exclusion flags thereof being off are determined to not satisfy the definition of the group (no path in step S350), the CPU 11 proceeds to step S360. If all the form data [P] through [N] with exclusion flags thereof being off are determined to satisfy the definition of the group (yes path in step S350), the CPU 11 proceeds to step S390. A specific example of all the form data satisfying the definition of the group is described below with reference to
In step S360, the CPU 11 determines whether the form data [N] duplicates any of form data [P] through [N−1]. The word duplicate means that one piece of the form data is identical another piece of the form data in form class. If the CPU 11 determines that the form data [N] does not duplicate any of form data [P] through [N−1] (no path in step S360), the CPU 11 proceeds to step S370. If the CPU 11 determines that the form data [N] duplicates any of form data [P] through [N−1] (yes path in step S360), the CPU 11 proceeds to step S380. The duplication example is described with reference to
In step S370, the CPU 11 determines whether the form data [N] and the form data [N−1] are in the same group. If the CPU 11 determines that the form data [N] and the form data [N−1] are not in the same group (no path in step S370), the CPU 11 proceeds to step S380. If the CPU 11 determines that the form data [N] and the form data [N−1] are in the same group (yes path in step S370), the CPU 11 proceeds to step S420.
In step S380, the CPU 11 substitutes N for P. Specifically, if the CPU 11 determines that the form data [N] duplicates any of form data [P] through [N−1] (yes path in step S360) or that the form data [N] and the form data [N−1] are not in the same group (not path in step S370), the start position of the grouping determination is set to the form data [N].
In step S390, the CPU 11 groups the form data [P] through the form data [N] with the exclusion flags thereof being off.
In step S400, the CPU 11 substitutes (N+1) for P.
In step S410, the CPU 11 determines whether P equals N. If the CPU 11 determines that P equals N (yes path in step S410), the CPU 11 proceeds to step S400. If the CPU 11 determines that P does not equal N (no path in step S410), the CPU 11 proceeds to step S420. P equaling N means that the start position of the grouping determination is an unknown form.
In step S420, the CPU 11 substitutes (N+1) for N.
In step S430, the CPU 11 determines whether the form data [N] is present. If the CPU 11 determines that the form data [N] is present, the CPU 11 returns to step S310. If the CPU 11 determines that the form data [N] is not present, the CPU 11 ends the grouping process.
The form data C1, form data C2, and form data C3 are arranged in this order in
The form data C3, form data C1, and form data C2 are arranged in this order in
The form data C1 and form data C2 are arranged in this order in
The form data C1, form data C1, form data C2, and form data C3 are arranged in this order in
The form data C1, form data C1, form data C2, unknown form and form data C3 are arranged in this order in
The form data D1, form data D2, and attached file are arranged in this order in
The form data D1, attached file, and form data D2 are arranged in this order in
The form data D1 and form data D2 are arranged in this order in
The attached file, form data D1, and form data D2 are arranged in this order in
The form data D1 and attached file are arranged in this order in
Form data B1, form data B2, and attached file are arranged in this order in
The form data B1, attached file, and form data B2 are arranged in this order in
The form data D1, form data D2, and unknown form are arranged in this order in
The form data B1, form data B2, and unknown form are arranged in this order in
The list screen 200 is described with reference to
When the user clicks the re-regroup button 202, the information processing apparatus 20 groups the form data.
When the user clicks the back button 204A or the next button 204B, the information processing apparatus 20 successively focuses on the ungrouped form icons 2. The click operation of the back button 204A focuses on a preceding ungrouped icon 2, shifting from the currently focused form icon 2. The click operation of the next button 204B focuses on a next ungrouped form icon 2, shifting from the currently focused form icon 2. This operation is specifically described below with reference to
When the user clicks the operate button 206, the information processing apparatus 20 displays operation candidates including re-operate operation by specifying a reset operation and the form definition information.
When the user selects the reset operation, the information processing apparatus 20 reverts the form data to a state before the change of the attribute is received. By performing the reset operation, the user may start over again.
When the user selects the re-execute operation by specifying the form definition information, the information processing apparatus 20 displays a modification screen 400. The re-execute operation performed by specifying the form definition information is described below with reference to
If the OK button 208 is clicked, the information processing apparatus 20 ends the grouping process.
The unknown form count display region 210 displays the number of unknown forms out of the form data.
The ungrouped form data count display region 212 displays the number of ungrouped pieces of the form data.
Referring to
The attribute display region 302 displays an attribute of the form data. For example, if the form icon 2 is displayed in a thumbnail of the form data, the attribute display region 302 displays the form class of the form data. The word “unknown” of the unknown form may be displayed in red.
When the user clicks the attribute change button 304, the information processing apparatus 20 displays the attribute candidate display region 306.
The attribute candidate display region 306 displays substitute attributes. The substitute attributes include a form class, attached file, and exclusion flag used on the form definition information 100. If an attribute is selected, the attribute of the form data is changed to the selected attribute. The word “unknown” may be displayed to indicate an “unknown form.”
When the user scrolls the scroll bar 308, the information processing apparatus 20 scrolls a display region of the form display region 300.
An arrow mark Y4 indicates an operation to number the form data in the order of arrangement with serial numbers. The form data is thus successively numbered with serial numbers 1, 2, 3, 4, and 5. The operation denoted by the arrow mark Y4 corresponds to the operation in step S130.
An arrow mark Y6 denotes an operation to change the serial number. The information processing apparatus 20 changes the serial number of the form data B1 from 3 to 4 and the serial number of the form data C3 from 4 to 3. Upon receiving a change operation from the user, the information processing apparatus 20 changes the serial number. A specific example of the change operation is described below with reference to
An arrow mark Y8 denotes an re-arrangement operation to re-arrange the form data in the order of the serial numbers. The information processing apparatus 20 re-arranges the form data in the order of the form data C1, form data C2, form data C3, form data B1, and form data B2.
An arrow mark Y10 denotes the grouping process. The information processing apparatus 20 groups the form data C1, form data C2, and form data C3 as the group C. Similarly, the information processing apparatus 20 groups the form data B1 and form data B2 as the group B.
An arrow mark Y18 denotes an operation to set the exclusion flag to on. Upon receiving an operation to change the exclusion flag, the information processing apparatus 20 sets the exclusion flag to on. The operation to change the exclusion flag is user operation to select an “exclude” in the attribute candidate display region 306 as illustrated in
An arrow mark Y20 denotes the grouping. The form data C1, form data C2, form data C3, and form data B1 with the exclusion flag thereof being off serve as targets of the grouping. The information processing apparatus 20 groups the form data C1, form data C2, and form data C3 as the group C.
Referring to
The buttons include a turn back button 406A, turn next button 406B, and enter button 408. The display regions include a form definition information selection region 402 and form definition information display region 404. The buttons respond to a click operation performed on a mouse serving as the input unit 15.
The form definition information selection region 402 displays definition of a pre-registered group. The information processing apparatus 20 receives the definition of the group selected by the user and displays the contents of the definition on the form definition information display region 404. For example, the form definition information selection region 402 displays as a choice the group A, group B, group C, and group D.
The form definition information display region 404 displays the contents of the definition of the selected group. For example, the contents of the definition of the group are an image corresponding to the form class defined for the group. Referring to
If the user clicks the turn back button 406A or the turn next button 406B, the information processing apparatus 20 successively switches and displays the form icons 2 from one icon to another. Referring to
If the user clicks the enter button 408, the information processing apparatus 20 performs modification to add the definition of the selected group to the form definition information 100. The information processing apparatus 20 then re-groups in accordance with the modified form definition information 100. If the form definition information 100 is modified, the information processing apparatus 20 may be free from re-reading information on the image data.
An arrow mark Y22 denotes the grouping. Since the group C is not defined in the form definition information 100, the unknown forms are arranged as they are.
An arrow mark Y24 denotes the re-grouping performed after the modification of the form definition information 100. The modification of the form definition information 100 is adding the definition of the group C to the form definition information 100. Since the form definition information 100 after being modified defines the group C, the unknown forms are arranged as the form data C1 and form data C3.
The information processing apparatus 20 of the exemplary embodiment has been described. The disclosure is not limited to the exemplary embodiment. A variety of changes or modifications is possible to the exemplary embodiment.
The algorithm of the grouping process is not limited to step S140 in
The number of attached files present may not necessarily be zero or more for grouping. For example, the information processing apparatus 20 may use one attached file present for grouping. In such a case, the form data D1 and form data D2 in
Referring to
The change operation is not limited to the drag-and-drop operation in
Referring to
The processes described above may be implemented by using dedicated hardware. In such a case, the processes may be implemented by using one or more pieces of hardware.
In the exemplary embodiment above, the term “processor” refers to hardware in a broad sense. Examples of the processor include general processors (e.g., CPU: Central Processing Unit) and dedicated processors (e.g., GPU: Graphics Processing Unit, ASIC: Application Specific Integrated Circuit, FPGA: Field Programmable Gate Array, and programmable logic device).
In the exemplary embodiment above, the term “processor” is broad enough to encompass one processor or plural processors in collaboration which are located physically apart from each other but may work cooperatively. The order of operations of the processor is not limited to one described in the exemplary embodiment above, and may be changed.
The program causing the information processing apparatus 20 to operate may be provided on a computer-readable recording medium or online via a network, such as the Internet. The computer-readable recording media include a universal serial bus (USB), flexible disk, compact disc read only memory (CD-ROM). The program recorded on the computer-readable recording medium is typically transferred to and stored on a memory or storage. The program may be a single piece of application software or may be built in software in each device operating as a function of the information processing apparatus 20.
The foregoing description of the exemplary embodiments of the present disclosure has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, thereby enabling others skilled in the art to understand the disclosure for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the disclosure be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2020-214342 | Dec 2020 | JP | national |