Embodiments of the present invention relate to the field of text entry technologies, in particular to a method for entering text based on an image.
Entering bills, tables, documents and more, is an important part of current digital management of paper information. The OCR (Optical Character Recognition) technology is a kind of computer entry technology, converting text of various bills, newspapers, books, manuscripts and other printed materials into image information by an optical entry manner such as scanning, and then converting the image information into text available for computer entry by a character recognition technology. As one of main manners to convert paper documents into text available for computer entry, the OCR technology may be applied to fields of entry and processing of bank bills, archive files and materials including a large amount of text. At present, a processing speed may reach 60-80 bills per minute, a recognition rate of a passbook has reached more than 85%, a recognition rate of a deposit receipt and a voucher has reached more than 90%. Due to the recognition rate of more than 85%, the number of data-entry clerks may be decreased by 80%, workloads of operators may be reduced, and duplication efforts may be reduced. However, since 100% accurate recognition has not been achieved, the data-entry clerk still have to manually enter apart of text content by referring to the paper text, and manually review the parts of content recognized.
Therefore, a method for entering text based on an image, by which fast enter speed of entering can be achieved, is badly needed.
In response to the above problems, the present invention proposes a method for entering text based on an image.
According to an aspect of the embodiments of the present invention, a method for entering text based on an image is provided. The method includes: acquiring a recognition parameter corresponding to at least one region in an image, the recognition parameter comprising text contents recognized from the at least one region and location information associated with the at least one region; selecting an entry location in an entry page and acquiring location information corresponding to a selected entry location; and determining text contents to be entered, based on the location information corresponding to the selected entry location and the recognition parameter.
In an embodiment, the acquiring location information corresponding to a selected entry location includes: acquiring a parameter value shared by a plurality of tag pages; and automatically locating, according to the parameter value shared by the plurality of tag pages, a display page to a region corresponding to the selected entry location; wherein the parameter value shared by the plurality of tab pages comprises location information corresponding to the selected entry location. In an embodiment, the automatically locating, according to the parameter value shared by the plurality of tag pages, a display page to a region corresponding to the selected entry location includes: zooming in on the region corresponding to the selected entry location.
In an embodiment, the acquiring a recognition parameter corresponding to at least one region in an image includes: dividing the image into regions automatically and recognizing text contents in the regions automatically divided.
In an embodiment, the recognizing text contents in the regions automatically divided includes: recognizing text contents in the regions automatically divided by using an OCR technology.
In an embodiment, the recognizing text contents in the regions automatically divided includes: scoring the text contents recognized to identify recognition accuracy.
According to another aspect of the present invention, a device for entering text based on an image is provided. The device includes: an acquiring recognition parameter unit, adapted to acquire a recognition parameter corresponding to at least one region in an image, the recognition parameter comprises text contents recognized from the at least one region and location information associated with the at least one region; a selecting and entering linkage unit, adapted to select an entry location in an entry page and acquire location information corresponding to a selected entry location; and, a determining text contents unit, adapted to determine text contents to be entered, based on the location information corresponding to the selected entry location and the recognition parameter.
In an embodiment, the selecting and entering linkage unit is further adapted to: acquire a parameter value shared by a plurality of tag pages; and automatically locate, according to the parameter value shared by the plurality of tag pages, a display page to a region corresponding to the selected entry location; wherein the parameter value shared by the plurality of tab pages comprises location information corresponding to the selected entry location.
In an embodiment, the selecting and entering linkage unit includes a zooming image unit; the zooming image unit is adapted to zoom in on the region corresponding to the selected entry location.
In an embodiment, the acquiring recognition parameter unit includes a dividing image and recognizing unit, the dividing image and recognizing unit is adapted to: divide the image into regions automatically and recognize text contents in the regions automatically divided.
In an embodiment, the dividing image and recognizing unit is further adapted to recognize text contents in the regions automatically divided by using an OCR technology.
In an embodiment, the dividing image and recognizing unit is further adapted to score the text contents recognized to identify recognition accuracy.
According to another aspect of the present invention, a device for entering text based on an image is provided. The device includes: a memory, a processor, and a computer program stored in the memory and executed by the processor, wherein when the computer program is executed by the processor, the processor implements the following steps: acquiring a recognition parameter corresponding to at least one region in an image, the recognition parameter comprising text contents recognized from the at least one region and location information associated with the at least one region; selecting an entry location in an entry page and acquiring location information corresponding to a selected entry location; and determining text contents to be entered, based on the location information corresponding to the selected entry location and the recognition parameter.
In an embodiment, when implementing the step of acquiring location information corresponding to a selected entry location, the processor specifically implements the following steps: acquiring a parameter value shared by a plurality of tag pages; and automatically locating, according to the parameter value shared by the plurality of tag pages, a display page to a region corresponding to the selected entry location; the parameter value shared by the plurality of tab pages comprises location information corresponding to the selected entry location.
In an embodiment, when implementing the step of automatically locating, according to the parameter value shared by the plurality of tag pages, a display page to a region corresponding to the selected entry location, the processor specifically implements the following step: zooming in on the region corresponding to the selected entry location.
In an embodiment, when implementing the step of acquiring a recognition parameter corresponding to at least one region in an image, the processor specifically implements the following step: dividing the image into regions automatically and recognizing text contents in the regions automatically divided.
In an embodiment, when implementing the step of recognizing text contents in the regions automatically divided, the processor specifically implements the following step: recognizing text contents in the regions automatically divided by using an OCR technology.
In an embodiment, when implementing the step of recognizing text contents in the regions automatically divided, the processor specifically implements the following step: scoring the text contents recognized to identify recognition accuracy.
According to another aspect of the present invention, a computer readable storage medium is provided, the computer readable storage medium storing an executable instruction executed by a processor; when the processor executes executable instruction, the processor executes a method described above.
The beneficial technical effects of the present invention are as follows:
The method for entering text based on an image provided by embodiments of the present invention makes it possible to efficiently perform an interactive operation of fast entry of forms, tickets, documents and so on. When a data-entry clerk enters text in a selected entry box, since an image is automatically placed to a corresponding location and contents of the uploaded image is enlarged, the data-entry clerk does not need to manually drag the image, to accomplish the entry, the time of the entry according the image may be greatly saved, and the entry efficiency may be improved. In addition, by identifying text contents, recognized through the OCR technology, with a recognition accuracy, users can quickly check according to the recognition accuracy directly when wants to perform a review, so the review time can be effectively reduced and the entry efficiency can be greatly improved.
In the detailed description of the following preferred embodiments, the reference is made to the accompanying drawings that form a part of the present invention. The accompanying drawings illustrate, by way of examples, specific embodiments that may achieve the present invention. The exemplary embodiments are not intended to be exhaustive of all embodiments in accordance with the present invention. It should be understood that other embodiments may be utilized and structural or logical modifications may be made without departing from the scope of the present invention. Therefore, the following detailed description is not restrictive, and the scope of the present invention is limited by the appended claims.
The present invention is described in detail below with reference to the accompanying drawings.
The present invention provides a method for entering text based on an image, and the method includes the following steps.
Step S101: a recognition parameter, corresponding to at least one region in an image, is acquired, the recognition parameter includes text contents recognized from the at least one region and location information associated with the at least one region.
Step S102: an entry location is selected in an entry page, and the following steps are performed: a parameter value, shared by a plurality of tag pages, is acquired; and according to the parameter value shared by the plurality of tag pages, a display page is automatically located to a region corresponding to the selected entry location. The parameter value shared by the plurality of tab pages comprises location information corresponding to the selected entry location.
Step S103: based on the location information corresponding to the selected entry location and the recognition parameter, text contents to be entered are determined.
It should be understood that the image targeted by the method includes multiple types of paper documents such as bills, tables, documents, and so on. The image is not limited to a specific type of paper documents. The method for entering text based on an image, according to embodiments of the present invention, is further elaborated below by taking the bill as an example.
An implementation process of the bill text entry will be described in detail below with reference to
Step S201: an image of a bill is uploaded to an entry system.
In this step, users need to upload a required bill to an entry system by any suitable means such as scanning. When the bill is not uploaded to the entry system correctly, according to the type of the uploading error, the entry system will provide a notice that the bill needs to be re-uploaded.
Step S202: it is judged whether an automatically dividing image model exists in the entry system.
When an automatically dividing image model exists in the entry system, Step S203 is executed, otherwise Step S204 is executed.
Step S203: the image of the bill is divided into regions automatically by the automatically dividing image model, and location information of the regions is acquired.
The automatically dividing image model in this embodiment is a model based on a machine learning algorithm, and an image is divided automatically into regions by determining a location of a keyword in the image. It should be understood that the image may also be divided automatically into regions by any suitable models and in any suitable manners.
Step S204: a pure manual entry mode is used.
Step S205: text contents in the regions divided automatically are recognized automatically by using an OCR technology.
It should be understood that text contents in the regions, have been divided automatically, may be recognized automatically by other any suitable manners.
Step S206: the text contents recognized are scored to identify recognition accuracy. A high score is a recognition item with high recognition accuracy by a system default, and a low score is a recognition item with low recognition accuracy by the system default. For example, in this embodiment, the recognition item with a score of 85 or higher is regarded as a recognition item with the high recognition accuracy, and a small rectangular frame (as shown in
It should be understood that purposes of identifying recognition accuracy is to facilitate a rapid view of the data-entry clerk, and the recognition item with high accuracy can be quickly confirmed to complete the entry, and a key point may be focused on the recognition item with the low recognition accuracy, and a problem of recognition inaccuracy may be corrected in time, thus the review time may be shortened. A scoring system is only one of manners to identify the recognition accuracy, and setting of a score level is not unique. Those skilled in the art may identify the recognition accuracy by other suitable manners.
Step S207: when the data-entry clerk selects the entry box for the text entry in an entry page, responding to the selected entry box, the display page is automatically located to a region corresponding to the keyword of a selected entry box. Specifically, as shown in
In an implementation process of this embodiment, browser cross-tab communication is adopted. Specifically, the browser window is used to monitor changes of Local Storage. A value in the Local Storage may be shared among different tabs, and linkage between the entry page and the display page is implemented according to characteristic of a storage event, and the specific implementation manner is as follows.
Firstly, location information of the region automatically divided from the image of the bill in Step S203 is represented by a coordinate point (x, y, w, h), as shown in
Then, an initialization process is carried out, and the coordinate point of the location information, of the region which has been already automatically divided, and the text contents, recognized by the region which has been already automatically divided in the Step S205, are added and stored in the local storage.
Subsequently, a mouse sliding event is monitored. When a user slides the mouse from a current location of the entry box to a location of the entry box that needs to be entered, a keyword corresponding to the entry box is obtained, and a coordinate point of a new location information corresponding to the keyword and text contents corresponding to the coordinate point are used to update a corresponding value in the Local Storage.
Then, changes of the Local Storage are monitored at the display page, and according to the corresponding value updated by a monitored storage event in the Local Storage, the image is translated to a corresponding region in the display page and the corresponding region is enlarged.
It should be understood that the browser cross-tab communication may also be achieved by other schemes such as Broadcast Channel, Cookie, Web Socket and so on. However, the Local Storage has better compatibility and a longer life cycle than the Broadcast Channel. Compared with the Cookies, when, because there is no event notification that the cookie is changed, business logic is implemented only by adopting repeated dirty checking, and the businesses logic are only used in the same domain. After the Cookies are polluted, content of request header of AJAX will be additionally added, and a storage space which is small is limited to 4K. The Web Socket is suitable for small projects, backend servers are required to maintain connections and subsequent information forwarding, which occupy more server resources. Therefore, in this embodiment, the Local Storage is used to achieve the browser cross-tab communication.
Step S208: when there are the text contents, which have been recognized, in the entry box placed by the mouse at the entry page shown in
Step S209: it is judged whether the text contents are recognized exactly; when the text contents are recognized exactly, Step S212 is executed; otherwise Step S211 is executed;
Step S210: in the entry box, the text contents are entered manually according to the content displayed on the display page, and then Step S212 is executed;
Step S211: the text contents recognized are amended manually in the entry box;
Step S212: click to confirm and the entry is completed.
In addition,
Further, in an embodiment, the acquiring recognition parameter unit 501 includes a dividing image and recognizing unit 501a, the dividing image and recognizing unit 501a is adapted to divide the image into regions automatically and recognize text contents in the regions. In an embodiment, the dividing image and recognizing unit 501a is further adapted to recognize text contents in the regions by using an OCR technology. In another embodiment, the dividing image and recognizing unit 501a is further adapted to score the text content recognized to identify recognition accuracy.
Further, in an embodiment, the selecting and entering linkage unit 502 includes a zooming image unit 502a; the zooming image unit 502a is adapted to zoom in on the region corresponding to the selected entry location.
A flow of the text entry method in
As described above, an example process of
As used herein, the term of the tangible computer readable medium is expressly defined to include any type of a computer readable stored signal. Additionally or alternatively, the example process of
Although the present invention has been described with reference to the specific examples, which are only intended to be illustrative and not limiting the present invention, it will be apparent to those skilled in the art that changes, additions or deletions may be made to the disclosed embodiments without departing from the spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
201711166037.1 | Nov 2017 | CN | national |
This application is a continuation of International Application No. PCT/CN2018/116414 filed on Nov. 20, 2018, which claims priority to Chinese patent application No. 201711166037.1 filed on Nov. 21, 2017. Both applications are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2018/116414 | Nov 2018 | US |
Child | 16288459 | US |