This application claims priority and benefits to Chinese Application No. 202011506446.3, filed on Dec. 18, 2020, the entire content of which is incorporated herein by reference.
The disclosure relates to a field of deep learning technology and a field of image processing technology, and more particularly to a method and an apparatus for character recognition and processing.
Character recognition is a method for extracting text information from an image, which is widely used in finance, education, audit, transportation and many other areas related to national economy and people's livelihood.
When performing the character recognition, recognized characters are arranged based on a relative occurrence sequence in a picture. For example, the recognized characters are arranged from left to right based on a sequence of these characters occurring in the picture.
A method for character recognition and processing is provided here. In on embodiment, a respective character region is labelled for each character contained in each sample image of a sample image set. A respective character category and a respective character position code corresponding to each character region are labelled. A preset neural network model for character recognition is trained based on the sample image set having labelled character regions, character categories and character position codes corresponding to the character regions.
An electronic device is provided here. In one embodiment, the electronic device includes: at least one processor; and a memory communicatively coupled to at least one processor. The memory stores instructions executable by the at least one processor. When the instructions are executed by the at least one processor, the at least one processor is caused to execute a method for character recognition and processing described above.
A non-transitory computer-readable storage medium having computer instructions stored thereon is provided here. In one embodiment, the computer instructions are configured to cause a computer to execute a method for character recognition and processing described above.
It is to be understood that, the content described in the part is not intended to identify key or important features of embodiments of the disclosure, nor intended to limit the scope of the disclosure. Other features of the disclosure will be easy to understand through the following specification.
Drawings are intended to make those skilled in the art to well understand technical solution of the disclosure and do not constitute a limitation to the disclosure.
Exemplary embodiments of the disclosure are described as below with reference to the accompanying drawings, which include various details of embodiments of the disclosure to facilitate understanding, and should be considered as merely exemplary. Therefore, those skilled in the art should realize that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Similarly, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following descriptions.
As mentioned above, the character sequence consisted of recognized characters may be wrong due to disorder of the recognized characters when performing the character recognition based on the relative occurrence position of characters in the picture. For example, as illustrated in
To solve the above technical problem, the disclosure provides a method for recognizing characters based on semantic segmentation, which determines a relative order of each recognized character in a final character sequence by predicting character position codes.
In detail,
At block 201, each character contained in each sample image of a sample image set is labelled using a respective character region.
The sample image set refers to a set of sample images, containing a large number of sample images. Each sample image contains multiple characters, including but not limited to English letters, numbers, Chinese characters, etc.
The character region may be provided for each character contained in the sample image. The character region may be a box enclosing the character and is configured to determine a position of the character.
It is to be noted that, depending on different application scenarios, the character regions may be provided for the characters in different manners as follows.
For each character contained in each image, positional coordinates of a character box corresponding to the character is obtained. The positional coordinates may include coordinates of a central pixel of the character, length and width of the character box. The length and the width of the character box may be determined based on coordinates of an uppermost pixel, coordinates of a lowermost pixel, coordinates of a leftmost pixel, and coordinates of a rightmost pixel of the character.
In addition, the character box can be contracted based on a preset contraction ratio and the positional coordinates, to differentiate different character regions and avoid a case that two identical characters adjacent to each other are identified as one character. The character region is labelled on the picture based on the positional coordinates of the contracted character box.
A value of the preset contraction ratio may be set based on experiments or based on a distance between adjacent characters. For example, a standard distance corresponding to a certain contraction ratio may be determined, and distances between central pixels of every two adjacent characters contained in the image are determined. If the distances are all greater than the standard distance, differences between the distances and the standard distance are obtained. If the differences are all greater than a preset distance threshold, it indicates that there is no risk of identifying the two adjacent characters as one. In this case, the certain contraction ratio may be set as 1. If one of the distances is less than the standard distance, the difference between the standard distance and the distance is obtained, and an increment value of the contraction ratio is determined based on the difference, where the difference is directly proportional to the increment value. Further, a final contraction ratio of the corresponding character box (i.e., any one of two adjacent character boxes corresponding to the distance less than the standard distance) is determined by adding the increment value to the certain contraction ratio.
For example, the sample image illustrated in In order to separate characters in a form of connectivity domain (i.e., characters are separated from each other, such that each separated result is a connectivity domain representing a respective character), each character box having the positional coordinates of (cx, cy, w, h) can be contracted to obtain a contracted character box having the positional coordinates of (cx, cy, w*r, h*r), where cx and cy represents coordinates of the central pixel of the character box, w represents the width of the character box, h represents the height of the character box, and r represents the contraction ratio.
A semantic recognition model is obtained in advance through training based on deep learning technology. For each pixel contained in each sample image of the sample image set, a respective probability recognized by the semantic recognition mode that the pixel corresponds to each character category is determined. A character category having a largest probability value is determined as the character category for the pixel. A connectivity domain formed by pixels corresponding to a common character category is determined as the character region.
The character region may be configured to record pixel positions of the character box. The character region may be provided as lines in the image.
At block 202, each character region is labelled with a respective character category and a respective character position code.
To further recognize the relative order of each character, character categories and character position codes corresponding to the character regions can be provided on the picture. Sequence information (i.e., the relative orders) of multiple characters may be determined based on the character position codes.
It is to be noted that, the character position code refers to any information for deducing the relative order of the corresponding character or the character sequence, which will be described in detail with following examples.
A preset length threshold of character string is obtained. A position index value of each character region is obtained. The position index value may be any information indicating a relative position of the character in the image. For example, a predictable length threshold of character string determined based on a recognition ability of the model may be L. For each character, the position index value refers to a relative order number i of the character in the image, where i is a positive integer. The larger the relative order number, the later the character occurs in the image. For example, a character “A” has a relative order number of 2, a character “C” has a relative order number of 1, and a character “N” has a relative order number of 3. In this case, the character “A” is after the character “C” in the image, and the character “N” is after the character “A” in the image. A calculation is performed based on the length threshold of character string and the position index value through a preset algorithm. The character position code corresponding to each character region is obtained based on the calculation result. For example, the preset algorithm is pi=1−i/L, where pi represents the character position code, and i represents the relative order number of the character in the image. For example, in the sample image illustrated in
In addition, the above preset algorithm may include calculating a ratio of the position index value to the preset length threshold of character string, or calculating a product of the position index value and the preset length threshold of character string.
Certainly, when the characters contained in the sample image are out of order, values of their pi may not indicate the relative order numbers of the characters. For example, characters contained in the sample image are “Ttex”, and after learning the character corresponding to p2 should be after characters corresponding to p3 and p4.
A respective distance between a character feature of each character having a certain order contained in a sample image and a character semantic feature is recognized. The distance between the character feature and the character semantic feature as well as the order of the character are determined as the character position code. The order may be the relative order number.
The character position code is determined based on two dimensions, i.e., the semantic and the order, to improve the accuracy of determining the character order.
The character category may be understood as referring to a character, such as character “A” or character “B” or the like. In this case, a character belonging to a character category means that the character is far example “A”, “B” or the like. Semantic recognition may be performed on each sample image. For example, a deep learning model can be obtained through training based on deep learning technology in advance, and each sample image is recognized by the deep learning model to obtain, for each character contained in the sample image, a respective probability that the character belongs to each character category. Multiple semantic segmentation images may be obtained based on the probabilities.
For example, as illustrated in
Further, for each character region of the sample image, a respective average probability of probabilities that all pixels within the character region belong to each character category is obtained. A character category corresponding to the maximum average probability is taken as the character category of the character region and each pixel within the character region is assigned with a preset index value corresponding to the character category, to label the character category for each character region. The index value may be in any form. For example, the index value is Ci, where ci∈[0,C], 0 represents a background category, and C is the total number of character categories.
The character category may be determined based on a shape feature of a connectivity domain formed by pixels all belonging to a common image feature.
At block 203, a preset neural network model for character recognition is trained based on the sample image set having character regions labelled therein, as well as the character category and the character position code corresponding to each character region.
After training the preset neural network model for character recognition based on the sample image set labelled with character regions, as well as the character categories and the character position codes corresponding to the character regions, the preset neural network model may recognize characters based on the character regions and determine the relative order number of each character and the character sequence based on the character position codes. The neural network model may be trained based on the deep learning technology. For example, the mentioned neural network model may be a Fully Convolutional Network (FCN).
Certainly, for training the neural network model for character recognition, a classification loss function may be adopted for the purpose of optimization. That is, the labelled character categories and the labelled character position codes are compared with the character categories and the character position codes of the sample image input to the neural network model, to calculate a loss value. When the loss value is greater than a preset threshold, a model coefficient of the neural network model is adjusted until the loss value is less than the preset threshold. Theoretically, regression loss functions, such as L2 loss, L1 loss Smooth L1 loss may be used as a loss function.
With the method for character recognition and processing according to the disclosure, the character region is labelled for each character contained in each sample image of the sample image set, and the character category and the character position code are labelled for the character region. In addition, the preset neural network model for character recognition is trained based on the sample image set having labelled character regions, as well as the character category and the character position code corresponding to each character region. Thus, recognized characters are ordered based on the character position codes to obtain the relative order number of each recognized character. A final result is obtained by ordering and combining the recognized characters based on the relative order number, to achieve correctly ordering recognized characters.
After the neutral network model is trained and obtained, given an image for testing, a character segmentation prediction image and a character position code prediction map may be obtained through the neutral network model. Predicted characters and character position codes of the predicted characters are obtained based on the character position code prediction map. The relative order number of each predicted character is obtained based on the character position codes, and the predicted characters are ordered based on the relative character orders. The final result is obtained by combining predicted characters.
As illustrated in
At block 601, a target image to be recognized is obtained.
The target image includes multiple characters.
At block 602, the target image is processed based on a neural network model, to obtain predicted characters and character position codes corresponding to the predicted characters. Each predicted character corresponds to a respective character position code.
Since a correspondence between images and predicted characters as well as character position codes of the predicted characters is learnt by the neutral network model in advance, the predicted characters and the character position codes can be obtained by processing the target image by the neural network model.
Since the target image itself cannot know the character regions contained therein, the character regions can be obtained by the neural network model through semantic segmentation.
The target image can be segmented based on characters through the neutral network model, to obtain semantic segmentation images. In detail, the target image is inputted to the neutral network model to obtain (C+1) semantic segmentation images, where the size of each semantic segmentation image is the same as the size of the input image, C is the total number of character categories. The extra one semantic segmentation image represents a background image where a probability that each pixel belongs to the background and a probability that each pixel belongs to a character are represented. Each semantic segmentation image represents probabilities that pixels contained in an original image belong to a respective character category corresponding to the semantic segmentation image.
Further, the background image is binarized by the neutral network model to obtain a character binary map.
A connectivity domain of a character in the character binary map may be regarded as a character region corresponding to the character. Further, a position of the character can be obtained by calculating the connectivity domain based on the character binary map. For a semantic segmentation image, an average probability of probabilities that all pixels within a connectivity domain of the semantic segmentation image belong to a corresponding character category is calculated as a probability value that the connectivity domain belongs to the corresponding character category. For each semantic segmentation image, the probability values that the connectivity domains belong to the corresponding character category can be determined in the same manner described above. A character category corresponding to the maximum probability is taken as the character category of the corresponding connectivity domain.
A position index value of each connectivity domain is recognized by the neutral network model and the character position code corresponding to the connectivity domain is determined based on the position index value, which may be referred to the above description.
At block 603, the predicted characters are ordered based on the character position codes corresponding to the predicted characters, to generate a target sequence of characters.
The character position codes can be used to deduce relative order numbers of the predicted characters. Thus, the predicted characters may be ordered based on the character position codes corresponding to the predicted characters, to generate the target sequence of characters.
For example, the target image is illustrated in
With the method for character recognition and processing according to the disclosure, the target image to be recognized is obtained. The target image is processed by the neural network model to obtain predicted characters and character position codes corresponding to the predicted characters. Further, the predicted characters are ordered based on the character position codes corresponding to the predicted characters to generate the target sequence of characters. Thus, by predicting a character position code for each character, determining a relative order number for each character based on the character position code, and combining the characters, accuracy of determining a character string is determined.
In order to achieve the above embodiments, the disclosure further provides an apparatus for character recognition and processing.
The first labelling module 810 is configured to label a respective character region for each character contained in each sample image of a sample image set.
The second labelling module 820 is configured to label a respective character category and a respective character position code corresponding to each character region.
The training module 830 is configured to train a preset neural network model for character recognition based on the sample image set having character regions labelled therein, as well as the character categories and the character position codes corresponding to the character regions.
In some examples, the first labelling module 810 is further configured to obtain positional coordinates of a character box corresponding to each character contained in each sample image, contract the character box based on a preset contraction ratio and the position coordinates, and label the character region based on position coordinates of the contracted character box.
The second labelling module 820 is further configured to assign pixels contained in each character region with respective index values that are preset to the character category corresponding to the character region.
The second labelling module 820 is further configured to obtain a preset length threshold of character string; obtain a position index value of each character region; perform a calculation based on the length threshold of character string and the position index value through a preset algorithm, and label the character position code corresponding to each character region based on a calculation result.
It is to be noted that the foregoing explanation of method embodiments of the method for character recognition and processing also applies to the apparatus for character recognition and processing in apparatus embodiments. The implementation principles are similar, which are not repeated here.
As illustrated in
The first obtaining module 940 is configured to obtain a target image to be recognized. The first obtaining module 950 is configured to process the target image through a neural network model, to obtain predicted characters and character position codes corresponding to the predicted characters. Each character position code corresponds to a respective predicted character.
The ordering module 960 is configured to order the predicted characters based on the character position codes corresponding to the predicted characters, to generate a target sequence of characters.
It is to be noted that the foregoing explanation of method embodiments of the method for character recognition and processing are also applicable to the apparatus for character recognition and processing in apparatus embodiments. The implementation principles are similar, which are not repeated here.
The disclosure further provides an electronic device and a readable storage medium.
As illustrated in
The memory 1002 is a non-transitory computer-readable storage medium according to the disclosure. The memory stores instructions executable by the at least one processor to cause the at least one processor to execute a method for character recognition and processing according to disclosure. The non-transitory computer-readable storage medium according to the disclosure is configured to store computer instructions. The computer instructions are configured cause a computer to execute the method for character recognition and processing according to embodiments of the disclosure.
As a non-transitory computer-readable storage medium, the memory 1002 may be configured to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to a method for character recognition processing in the embodiment of the present disclosure. The processor 1001 executes various functional applications and data processing of the server by running a non-transitory software program, an instruction, and a module stored in the memory 1002, that is the method for character recognition and processing in the above method embodiments is implemented.
The memory 1002 may include a program storage area and a data storage area. The program storage area may store operation systems and application programs required by at least one function. The data storage area may store data created based on the use of an electronic device for character recognition processing. In addition, the memory 1002 may include a high-speed random-access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some examples, the memory 1002 optionally includes a memory set remotely relative to the processor 1001 that may be connected to an electronic device for character recognition and processing via a network. The example of the above networks includes but not limited to an Internet, an enterprise intranet, a local area network, a mobile communication network and their combination.
The electronic device for implementing the method for character recognition and processing may farther include an input apparatus 1003 and an output apparatus 1004, The processor 1001, the memory 1002, the input apparatus 1003, and the output apparatus 1004 may be connected through a bus or in other ways.
The input apparatus 1003 may receive input digital or character information, and generate key signal input related to user setting and function control of an electronic device for character recognition processing, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, an indicating rad, one or more mouse buttons, a trackball, a joystick and other input apparatuses. The output apparatus 1004 may include a display device, an auxiliary lighting apparatus (for example, a LED) and a tactile feedback apparatus (for example, a vibration motor), etc. The display device may include but not limited to a liquid crystal display (LCD), a light emitting diode (LED) display and a plasma display. In some implementations, a display device may be a touch screen.
Various implementation modes of the systems and technologies described herein may be implemented in a digital electronic circuit system, an integrated circuit system, a dedicated ASIC (application specific integrated circuit), a computer hardware, a firmware, a software, and/or combinations thereof. The various implementation modes may include: being implemented in one or more computer programs, and the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, and the programmable processor may be a dedicated or a general-purpose programmable processor that may receive data and instructions from a storage system, at least one input apparatus, and at least one output apparatus, and transmit the data and instructions to the storage system, the at least one input apparatus, and the at least one output apparatus.
The computer programs (also called as programs, software, software applications, or codes) include machine instructions of a programmable processor, and may be implemented with high-level procedure and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms “a machine-readable medium” and “a computer-readable medium” refer to any computer program product, device, and/or apparatus configured to provide machine instructions and/or data for a programmable processor (for example, a magnetic disk, an optical disk, a memory, a programmable logic device (PLD)), including a machine-readable medium that receive machine instructions as machine-readable signals. The term “a machine-readable signal” refers to any signal configured to provide machine instructions and/or data for a programmable processor.
In order to provide interaction with the user, the systems and technologies described here may be implemented on a computer, and the computer has: a display apparatus for displaying information to the user (for example, a CRT (cathode ray tube) or an LCD (liquid crystal display) monitor); and a keyboard and a pointing apparatus (for example, a mouse or a trackball) through which the user may provide input to the computer. Other types of apparatuses may further be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including an acoustic input, a voice input, or a tactile input).
The systems and technologies described herein may be implemented in a computing system including back-end components (for example, as a data server), or a computing system including middleware components (for example, an application server), or a computing system including front-end components (for example, a user computer with a graphical user interface or a web browser through which the user may interact with the implementation mode of the system and technology described herein), or a computing system including any combination of such back-end components, middleware components or front-end components. The system components may be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), an internet and a blockchain network.
The computer system may include a client and a server. The client and server are generally far away from each other and generally interact with each other through a communication network. The relation between the client and the server is generated by computer programs that run on the corresponding computer and have a client-server relationship with each other. A server may be a cloud server, also known as a cloud computing server or a cloud host, is a host product in a cloud computing service system, to solve the shortcomings of large management difficulty and weak business expansibility existed in the traditional physical host and Virtual Private Server (VPS) service. A server further may be a server with a distributed system, or a server in combination with a blockchain. A server further may be a server with a distributed system, or a server in combination with a blockchain.
In order to achieve the above embodiment, the disclosure further provides a computer program product. When instructions stored in the computer program product are executed by a processor, the method for character recognition and processing described above is executed.
It is to be understood that, various forms of procedures shown above may be configured to reorder, add or delete blocks. For example, blocks described in the disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the disclosure may be achieved, which will not be limited herein.
The above specific implementations do not constitute a limitation on the protection scope of the disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made according to design requirements and other factors. Any modification, equivalent replacement, improvement, etc., made within the spirit and principle of embodiments of the disclosure shall be included within the protection scope of embodiments of the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202011506446.3 | Dec 2020 | CN | national |