This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2021-050772 filed Mar. 24, 2021.
The present invention relates to an information processing apparatus and a non-transitory computer readable medium storing an information processing program.
JP2002-170079A discloses a document format identification device. The document format identification device includes a creation unit that creates document format data used to identify a document format on the basis of the feature quantity of a document image, a storage unit that stores the document format data, a determination unit that uses the creation unit to obtain document format data from an image of a document of which a document format is to be identified, compares this document format data with the document format data stored in the storage unit, and determines whether or not the document format data has a similarity relationship, a similarity information extraction unit that extracts similarity information representing the state of similarity between the document of which a document format is to be identified and the document stored in the storage unit in a case where the determination unit determines that the document format data has similarity, and an identification unit that calculates the similarity of the document format on the basis of the similarity information extracted by the similarity information extraction unit and the document format data and identifies the document format of the document to be identified.
In a case where form information is extracted from a form image obtained from the reading of a form in which form information including at least one of a predetermined item or an item value is written, a reference form corresponding to the form image is identified from the extracted form information, and the reference position of form information of the reference form is revised to a revised position by user's designation, the form information cannot be extracted at the designated revised position in a case where the distortion of the form image, such as the deviation of the position of the form image, is more than the distortion of the reference form.
Aspects of non-limiting embodiments of the present disclosure relate to an information processing apparatus and a non-transitory computer readable medium storing an information processing program that may extract form information even though a read form image is distorted in a case where a reference position of form information of a reference form is revised to a revised position by user's designation.
Aspects of certain non-limiting embodiments of the present disclosure overcome the above disadvantages and/or other disadvantages not described above. However, aspects of the non-limiting embodiments are not required to overcome the disadvantages described above, and aspects of the non-limiting embodiments of the present disclosure may not overcome any of the disadvantages described above.
According to an aspect of the present disclosure, there is provided an information processing apparatus including a processor configured to acquire a form image that is obtained from reading of a form in which form information including at least one of a predetermined item or an item value is written, extract the form information from the acquired form image, identify a reference form corresponding to the form image from the extracted form information, correct a revised position using a difference between a position of form information, of which a corresponding reference position is not revised, of the extracted form information and a reference position corresponding to the position, in a case where the reference position of form information of the reference form is revised to the revised position by user's designation, and extract the form information at the corrected revised position.
Exemplary embodiment(s) of the present invention will be described in detail based on the following figures, wherein:
An example of an exemplary embodiment of a disclosed technique will be described in detail below.
Further, an operation unit 12, a display unit 13, a communication unit 14, and a storage unit 15 are connected to the I/O 11D.
The operation unit 12 includes, for example, a mouse and a keyboard.
The display unit 13 is formed of, for example, a liquid crystal display or the like.
The communication unit 14 is an interface that is used for data communication with an external device.
The storage unit 15 is formed of a non-volatile external storage device, such as a hard disk, and stores an information processing program 15A, a form database (DB) 15B, and a reference form database (DB) 15D to be described later, and the like. The CPU 11A reads the information processing program 15A, which is stored in the storage unit 15, into the RAM 11C and executes the information processing program 15A.
The form acquisition unit 20 acquires the image data of an image, which are obtained from the reading of a form, from an image reading apparatus 30 and stores the acquired image data in the form DB 15B.
For example, an image forming device or the like that includes a scanning function, a printer function, a FAX function, and the like is applied as the image reading apparatus 30. The form acquisition unit 20 may acquire the image data of a form from computer devices, such as a server computer and a personal computer.
The form is a document in which information is written in predetermined items. Examples of such a document include documents and the like in which information necessary for business or transactions is written, such as books, slips, applications, invoices, and application forms, but are not limited thereto. Writing includes not only a case where a user writes on the form by hand but also a case where a user inputs information to the form using a computer.
Further, the item of the form means a clue in a case where information written in the form is extracted from the image data of an image obtained from the reading of the form, that is, a specific character serving as a “key”. Accordingly, the item of the form is referred to as a “key” in the following description. Furthermore, information representing the contents of the form is referred to as “item values” or “values”. There are two types of values, that is, a value of which a corresponding key exists and a value of which a corresponding key does not exist. A key of which a corresponding value does not exist may exist. In the following description, information including at least one of a key or a value may be referred to as form information.
In a case where a form is, for example, an invoice shown in
In the case of an example shown in
The key/value-extracting unit 21 executes optical character recognition processing (OCR) for the image data of the form that are acquired by the form acquisition unit 20. Characters included in the form and information about the positions of the characters on the form are obtained as character recognition results from the optical character recognition processing.
Further, the key/value-extracting unit 21 extracts the characters of keys and values and the positions of the characters on the form according to a predetermined extraction condition. Here, a character means both the case of one character and the case of a character string formed of a plurality of characters.
In a case where the character of a key to be extracted and a value corresponding to the key to be extracted exist, the read position and the like of the value are defined in the extraction condition. The read position of the value is defined as information about a position relative to the key, and, for example, at least one of information about a direction relative to the key, such as the right side or the lower side of the key, or information about a distance from the key is defined.
In a case where the read position of the value corresponding to, for example, the key “Dear Sir” is defined to be positioned on the left side of the key and the form of the invoice shown in
The form identification unit 22 identifies the format of the form by analyzing the image data of the form that are acquired by the form acquisition unit 20. Further, the form identification unit 22 generates extraction result information as necessary as information that is necessary to identify the format of the form, and registers the generated extraction result information in the reference form DB 15D.
Here, the format of a form means a format that is determined depending on the positions of keys and values of the form. For example, in a case where the types of forms, such as an invoice and a statement of delivery, are different from each other, the formats of the forms are different from each other. Further, in a case where the position of at least one key or value among keys and values included in forms is different even though the types of the forms are the same, the formats of the forms are different from each other. In a case where the positions of corresponding keys and values are completely the same in two forms, that is, a form as a reference and a form as an object to be determined at the time of determination of the formats of the forms, the forms are determined as forms having the same format in this exemplary embodiment. On the other hand, in a case where the position of at least one key or value among the positions of corresponding keys and values is different in two forms, the forms are determined as forms having different formats.
The form identification unit 22 identifies the format of the form by determining whether or not the form from which the keys and the values have been extracted by the key/value-extracting unit 21 and a reference form of which the extraction result information is registered in the reference form DB 15D are the same.
The revision processing unit 23 causes a user to confirm the positions and sizes of the keys and the values of the reference form and accepts revision. Then, in a case where the revision processing unit 23 accepts revision, the revision processing unit 23 reflects a revised position and a revised size in the reference form DB 15D. Further, the revision processing unit 23 executes the optical character recognition processing at the revised position.
The revised position-correcting unit 24 detects the distortion of a read form image, and corrects the revised position according to the detected distortion.
The image data of the form acquired by the form acquisition unit 20 are accumulated in the form DB 15B. Information about the keys and the values extracted by the key/value-extracting unit 21 is registered in a KV extraction result DB 15C as key/value extraction results. The key/value extraction results obtained from the key/value-extracting unit 21 are registered in the reference form DB 15D as the extraction result information of the reference form, and are used to determine whether or not the format of a form as an object to be processed is the same as the format of the reference form.
The form DB 15B, the KV extraction result DB 15C, and the reference form DB 15D are provided in the information processing apparatus 10 in this exemplary embodiment, but at least some of these databases may be provided in, for example, an external device connected to a network.
Next, the action of the information processing apparatus 10 according to this exemplary embodiment will be described.
In Step S100, the CPU 11A acquires the image data of a form read by the image reading apparatus 30, that is, a form to be processed.
In Step S102, the CPU 11A extracts keys and values by analyzing the image data of the form to be processed, which are acquired in Step S100, using a publicly known technique. Then, the CPU 11A registers the extraction result information of the extracted keys/values, that is, information representing the characters and positions of the keys and the values in the KV extraction result DB 15C.
In Step S104, the CPU 11A identifies the type of the form on the basis of the extraction results of the keys and the values extracted in Step S102.
In this exemplary embodiment, the CPU 11A identifies the type of the form using cosine similarity. Here, a case where the key/value extraction result information of forms B to E as reference forms is already registered in the reference form DB in processing of Step S108 to be described later and a form to be processed is a form A as shown in
With regard to the cosine similarity, data having n elements are represented by n-dimensional space vectors and similarity is represented by an angle between the vectors. In a case where two data to be compared with each other are denoted by x and y, cosine similarity cos θ is represented by the following equation.
Cosine similarity cos θ has a value in the range of −1 to +1, and similarity is higher as the cosine similarity cos θ is closer to +1.
For example, in this exemplary embodiment, the positions of keys and values of two forms to be compared with each other are represented by n-dimensional space vectors and cosine similarity is calculated by Equation (1).
Specifically, cosine similarity between the form A, which is the form to be processed, and each of the forms B to E, which are the reference forms, is calculated. Then, among the reference forms that have the calculated cosine similarity equal to or larger than a predetermined threshold value, the format of the reference form having the highest cosine similarity is identified as the format of the form A. The threshold value is set to a value where the two forms can be determined as forms having the same format in a case where the calculated cosine similarity is equal to or larger than the threshold value and the two forms can be determined as forms having different formats in a case where the calculated cosine similarity is less than the threshold value.
In a case where cosine similarity is calculated, the positions of all keys and values included in the form do not need to be used and the positions of at least one of at least two or more keys or values among the positions of all the keys and the values included in the form may be included.
For example, in a case where a threshold value, which is a reference used to determine whether or not forms are the same, is set to 0.8, the form having cosine similarity of 0.8 or more is only the form C having cosine similarity of 0.913 in an example shown in
In Step S106, the CPU 11A determines whether or not there is a reference form having the same format as the form to be processed in the form identification processing of Step S104. Then, in a case where there is no reference form having the same format as the form to be processed, the processing proceeds to Step S108. In a case where there is a reference form having the same format as the form to be processed, the processing proceeds to Step S110.
In Step S108, the CPU 11A registers the extraction result information of the keys/values, which are extracted in Step S102, in the reference form DB 15D as a reference form. For example, in a case where cosine similarity between the forms A and C is less than 0.8 in the example shown in
For convenience of description, processing of Steps S110 and S112 will be described later and processing of Step S114 to be executed after Step S108 will be described.
In Step S114, the CPU 11A executes confirmation/revision processing of causing a user to confirm and revise the positions of the values of the form. For example, in a case where the position of a value corresponding to “invoice number”, which is a key, is defined as a position next to the right side of the key as the extraction condition for keys/values in the case of the invoice shown in
Accordingly, an edit screen in which the positions of values included in the form can be edited is displayed on the display unit 13. In this case, the edit screen is displayed so that sets of keys and values automatically extracted are understood. For example, in a case where ranges specified from the positions of keys and values, for example, circumscribed regions are displayed so as to be surrounded by frames as shown in
In the example shown in
In a case where an operation for revising the position of the value to a revised position is executed, the coordinates, that is, at least one coordinate of the X coordinate or the Y coordinate of the value subjected to revision is updated. Further, in a case where the size of a character is different from the size of the frame, a user may change the size of the frame by a predetermined operation. In this case, the size, that is, at least one of the width or height of the rectangular region of the value is updated according to the user's operation for changing the size of the frame. The position of the value has been described here by way of example, but the position of the key can also be revised likewise.
In Step S116, the CPU 11A registers extraction result information, in which the position of the value subjected to revision by the user is reflected, in the reference form DB 15D. The CPU 11A registers extraction result information, which is not yet subjected to revision, and extraction result information, which is subjected to revision, in the reference form DB 15D as a set. Accordingly, in a case where this routine is executed the next time or later, values are extracted on the basis of the extraction result information subjected to revision. Therefore, for example, in a case where a read form is an invoice shown in
Next, in Step S110, the CPU 11A executes revised position-correction processing shown in
In a case where the distortion of the form occurs and a key can be extracted, the position of a value is defined as a position relative to the position of the key. Accordingly, the value can be extracted without any problem. However, in a case where the position of the value is manually revised by a user in the processing of Step S114, the position of the value is registered in the reference form DB 15D as an absolute position. For this reason, in a case where the read form is distorted, a value of which the position is manually revised may not be capable of being accurately extracted.
In a case where a form to be processed is a form F of an invoice shown in
Further, in a case where the form F becomes an object to be processed again after the position of the value is revised, it is assumed that the form F is read in a state where the form F is rotated as shown by a broken line H of
On the other hand, since the position of the value corresponding to “deadline” is manually revised to an absolute position, a value is extracted at the revised position regardless of the distortion of the form F. However, since “2021 Jan. 31”, which is an actual value, is deviated to the lower right side as shown in
Accordingly, the revised position of a value is corrected on the basis of the amounts of deviation in the positions of other keys/values in the revised position-correction processing of
In Step S200, the CPU 11A determines whether or not there is a key or a value of which at least one of the coordinate or the size is revised among the keys and values that are included in the reference form and identified in Step S104 of
In Step S202, the CPU 11A calculates a distance between the revised position of the key or the value of which at least one of the coordinate or the size is revised and each of the positions of other keys and values. In the case of an example shown in
In Step S204, the CPU 11A specifies a key or a value corresponding to the shortest distance among the distances calculated in Step S202. The shortest distance from the revised position is used in this exemplary embodiment, but the present invention is not limited thereto. For example, an average and the like may be used as, for example, a value representing the tendency of the distances calculated in Step S202.
In Step S206, the CPU 11A calculates distances between the positions of keys and values of which the positions are not revised among the keys and values included in the form to be processed and the positions (reference positions) of keys and values of a corresponding reference form, respectively. Then, the CPU 11A calculates a variance of the respective calculated distances.
In the example shown in
Here, since a case where the variance is large means that a variation of the distances is large, there is a high possibility that the read form has been rotated. On the other hand, since a case where the variance is small means that a variation of the distances is small, there is a high possibility that the read form has been translated or has not been moved.
Accordingly, in Step S208, the CPU 11A determines whether or not the variance calculated in Step S206 is equal to or less than a predetermined variance threshold value TH1. The variance threshold value TH1 is set to a value where it can be determined that the read form is translated in a case where the variance is equal to or less than the threshold value TH1.
Then, in a case where the variance is equal to or less than the variance threshold value TH1, that is, in a case where the form is considered to be translated, the processing proceeds to Step S210. In a case where the variance is larger than the threshold value TH1, that is, in a case where the form is considered to be rotated, the processing proceeds to Step S212.
In Step S212, the CPU 11A determines whether or not the shortest distance specified in Step S204 is less than a predetermined distance threshold value TH2. The distance threshold value TH2 is set to a value where a position can be corrected in processing of Step S214 to be described later in a case where the shortest distance is less than the distance threshold value TH2 and it can be determined that it is better to manually revise a position by a user in a case where the shortest distance is equal to or larger than the distance threshold value TH2.
Then, in a case where the shortest distance specified in Step S204 is less than the distance threshold value TH2, the processing proceeds to Step S212. In a case where the shortest distance is equal to or larger than the distance threshold value TH2, the processing returns to the processing of
In Step S214, the CPU 11A calculates the amounts of deviation in position and scale. Specifically, the CPU 11A calculates a difference in a distance between the position of the key or the value corresponding to the shortest distance from the position of the key or the value, which is specified in Step S204 and is revised, and the position of a key or a value of the corresponding reference form as the amount of deviation in position.
Further, the CPU 11A calculates a difference between the size of the key or the value corresponding to the shortest distance from the position of the key or the value, which is specified in Step S204 and is revised, and the size of a key or a value of the corresponding reference form, that is, a difference in width and a difference in height as the amount of deviation in size.
In the example shown in
In Step S214, the CPU 11A corrects the revised position and the revised size of the circumscribed region of the key or the value on the basis of the amounts of deviation in position and size that are calculated in Step S214. In the example shown in
Returning to
As shown in
Further,
As shown in
A case where both a position and a size are corrected has been described in this exemplary embodiment, but only any one of a position and a size may be corrected.
Further, an aspect in which the information processing program is installed in the storage unit 15 has been described in this exemplary embodiment, but the present invention is not limited thereto. The information processing program 15A according to this exemplary embodiment may be provided in a form where the information processing program 15A is recorded on a computer-readable storage medium. For example, the information processing program 15A according to this exemplary embodiment may be provided in a form where the information processing program according to this exemplary embodiment is recorded on optical discs, such as a Compact Disc (CD)-ROM and a Digital Versatile Disc (DVD)-ROM, or a form where the information processing program according to this exemplary embodiment is recorded on semiconductor memories, such as a Universal Serial Bus (USB) memory and a memory card. Further, the information processing program according to this exemplary embodiment may be acquired from an external device through a communication line connected to the communication unit 14.
In the embodiments above, the term “processor” refers to hardware in abroad sense. Examples of the processor include general processors (e.g., CPU: Central Processing Unit) and dedicated processors (e.g., GPU: Graphics Processing Unit, ASIC: Application Specific Integrated Circuit, FPGA: Field Programmable Gate Array, and programmable logic device).
In the embodiments above, the term “processor” is broad enough to encompass one processor or plural processors in collaboration which are located physically apart from each other but may work cooperatively. The order of operations of the processor is not limited to one described in the embodiments above, and may be changed.
The foregoing description of the exemplary embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2021-050772 | Mar 2021 | JP | national |