An embodiment of the present invention relates to a character recognition device, a character recognition method, and a program.
Since the past, a character recognition technique of recognizing characters included in an input image has been known. According to such a prior art, it is known that the accuracy of recognition can be improved by recognizing characters without explicitly dividing the boundaries between the characters. However, in a case where characters are recognized without dividing the boundaries between the characters, one character may be recognized overlappingly. In addition, depending on the input image, some characters may be skipped over during recognition. That is, according to the prior art, there is a problem in that characters included in the input image cannot be correctly recognized.
A problem to be solved by the present invention is to provide a character recognition device, a character recognition method, and a program that make it possible to correctly recognize characters included in an input image.
A character recognition device of an embodiment includes a first score calculation unit, a character region estimation unit, a second score calculation unit, and a selection unit. The first score calculation unit calculates a first score indicating a likelihood of a character string, or the first score for each of a plurality of candidate character strings which are candidates for character strings included in an input image. The character region estimation unit estimates a region corresponding to each character included in the candidate character string among regions of the input image. The second score calculation unit calculates a second score indicating a consistency of characters included in the candidate character string on the basis of the estimated region. The selection unit selects one or more character strings from among the plurality of candidate character strings on the basis of the calculated first score and the calculated second score.
Hereinafter, a character recognition device, a character recognition method, and a program of an embodiment will be described with reference to the accompanying drawings.
Problems that arise in a case where a character recognition method according to the prior art is used will be described with reference to
As an example of overlapping reading, a case in which an input image 91 is input will be described. In a case where correct character recognition is performed on a character string written in the input image 91, a region 92 and a region 93 are estimated as positions at which characters are written. Since “” is written in the region 92 and “
” is written in the region 93, a character recognition method according to the prior art makes it possible to recognize that “
” is written in the input image 91. On the other hand, in a case where the character string written in the input image 91 is erroneously recognized as characters, the region 92, the region 93, and a region 94 are estimated as positions at which the characters are written. Since “
” is written in the region 92, “
” is written in the region 93, and “
” is written in the region 94, the character recognition method according to the prior art results in erroneous recognition that “
” is written the input image 91. In this way, in an example of overlapping reading, the left-hand side of “
” is recognized as “
” and then “
” is recognized again, resulting in the problem of the character “
” being recognized overlappingly.
Next, in an example of skipping, a case in which an input image 96 is input will be described. In a case where correct character recognition is performed on a character string written in the input image 96, a region 97, a region 98, and a region 99 are estimated as positions at which characters are written. Since “” is written in the region 97, “
” is written in the region 98, and “
” is written in the region 99, the character recognition method according to the prior art makes it possible to recognize that “
” is written in the input image 96. On the other hand, in a case where the character string written in the input image 96 is erroneously recognized as characters, the region 97 and the region 98 are estimated as positions at which the characters are written. Since “
” is written in the region 97 and “
” is written in the region 99, the character recognition method according to the prior art results in erroneous recognition that “
” is written in the input image 96. In this way, in an example of skipping, there is a problem in that the character “
” written in the region 98 is skipped over in recognition.
The character recognition device according to the present embodiment overcomes the problems caused by the prior art as described above. The character recognition device according to the present embodiment performs character string recognition on a character string written in an input image. The character string recognition is a task of recognizing a character string in an image using an image including the character string as an input. In the present embodiment, a horizontally written character string which is read from left to right will be described. Meanwhile, in the present embodiment, there is no limitation to a horizontally written character string which is read from left to right, and the same applies to a vertically written character string. Meanwhile, the image including a character string broadly includes an image of handwritten characters, a photographed signboard, a road sign, and the like. Meanwhile, in the present embodiment, a case in which the number of characters included in a character string is equal to or greater than 0 will be described.
The input image acquisition unit 21 acquires an input image IM. ” is written in the input image IM. In the present embodiment, an example of a case in which the handwritten character string S is written in the input image IM will be described.
Referring back to
The character recognition device 10 selects a plausible character string from among a plurality of candidate character strings CS calculated by the candidate character string calculation unit 22 as the selection character string SS on the basis of the input image IM acquired by the input image acquisition unit 21 and the plurality of candidate character strings CS calculated by the candidate character string calculation unit 22. The character recognition device 10 includes a first score calculation unit (character recognition unit) 110, a character region estimation unit 120, a second score calculation unit (region consistency score calculation unit) 130, and a selection unit 140.
The output unit 23 outputs the selection character string SS selected by the character recognition device 10. The output unit 23 outputs the selection character string SS, for example, by outputting information for displaying the selection character string SS on a display unit (not shown), outputting information for performing audio output from an audio output unit (not shown), or performing wireless output to an information processing device (not shown).
The first score calculation unit 110 calculates a first score S1 for each of the plurality of candidate character strings CS calculated by the candidate character string calculation unit 22. The candidate character string CS is a candidate for the character string S included in the input image IM. The first score S1 indicates the likelihood of a character string. That is, the first score calculation unit 110 calculates the first score S1 indicating the likelihood of a character string, or the first score S1 for each of the plurality of candidate character strings CS which are candidates for the character string S included in the input image IM.
” is included in the input image IM. In this example, the candidate character string calculation unit 22 calculates “
” as a candidate character string CS-1, “
” as a candidate character string CS-2, and “
” as a candidate character string CS-3. The first score calculation unit 110 calculates the first score S1 for each of the candidate character string CS-1, the candidate character string CS-2, and the candidate character string CS-3. In this example, a first score S1-1 of the candidate character string CS-1 is “0.5,” a first score S1-2 of the candidate character string CS-2 is “0.5,” and a first score S1-3 of the candidate character string CS-3 is “0.1.”
Referring back to
The second score calculation unit 130 calculates a second score S2 indicating the consistency of characters included in the candidate character string CS on the basis of the character region CA which is a region estimated by the character region estimation unit 120. Here, the consistency of characters included in the candidate character string CS refers to spatial consistency. In a case where there is no spatial consistency, characters may overlap each other or characters may be skipped.
” is included in the input image IM. In this example, the character region estimation unit 120 estimates the character region CA which is a region corresponding to each character C included in the candidate character string CS. For example, the character region estimation unit 120 estimates regions corresponding to the character C included in “
” which is the candidate character string CS-1 as a character region CA1-1, a character region CA2-1, and a character region CA3-1. In addition, the character region estimation unit 120 estimates regions corresponding to the character C included in “
” which is the candidate character string CS-2 as a character region CA1-2 and a character region CA2-2. In addition, the character region estimation unit 120 estimates regions corresponding to the character C included in “
” which is the candidate character string CS-3 as a character region CA1-3, a character region CA2-3, and a character region CA3-3.
The second score calculation unit 130 calculates the second score S2 for each of the plurality of candidate character strings CS. In this example, a second score S2-1 of the candidate character string CS-1 is “0.1,” a second score S2-2 of the candidate character string CS-2 is “1.0,” and a second score S2-3 of the candidate character string CS-3 is “1.0.”
Referring back to
(Step S110) The input image acquisition unit 21 acquires the input image IM. The candidate character string calculation unit 22 calculates the candidate character string CS which is a candidate for the character string S written in the input image IM. In this flowchart, a case in which the candidate character string calculation unit 22 calculates n (n is integer equal to or greater than 1) candidate character strings CS will be described.
(Step S120) The first score calculation unit 110 calculates the first score S1 for each of the candidate character strings CS among the calculated plurality of candidate character strings CS. That is, in a case where the candidate character string CS is yn and the first score S1 is αn, the first score calculation unit 110 calculates (y1, α1) . . . (yn, αn).
(Step S130) The second score calculation unit 130 sets a counter i to 1.
(Step S140) The character region estimation unit 120 estimates a region of the input image IM that corresponds to each of a plurality of characters C included in the candidate character string CS. In this flowchart, a case in which m characters are included in the candidate character string CS (m is an integer equal to or greater than 1) will be described. That is, characters C of yi, 1, . . . , yi, m are included in yi which is the candidate character string CS. In this case, the character region estimation unit 120 estimates s1, . . . , sm which are the character region CA corresponding to each character C.
(Step S150) The second score calculation unit 130 calculates the second score S2 for yi which is the candidate character string CS. The second score S2 is also referred to as βn. In addition, βn which is the second score S2 is calculated on the basis of s1, . . . , sm.
(Step S160) The selection unit 140 calculates yi on the basis of αi which is the first score S1 and βi which is the second score S2.
(Step S170) In the case of i<n, the second score calculation unit 130 advances the process to step S190. That is, the second score calculation unit 130 repeats steps from step S140 to step S160 until the counter i reaches n which is the number of candidate character strings CS calculated by the candidate character string calculation unit 22. In the case of not being i<n, that is, in a case where the counter i has reached n which is the number of candidate character strings CS calculated by the candidate character string calculation unit 22, the second score calculation unit 130 advances the process to step S180.
(Step S190) The second score calculation unit 130 increments the counter i, and advances the process to step S140.
(Step S180) The selection unit 140 selects the candidate character string CS in which γn becomes maximum as the selection character string SS. In this flowchart, the selection unit 140 selects the selection character string SS based on the maximum point set. Meanwhile, the selection unit 140 may select the selection character string SS based on the minimum point set in accordance with a method of calculating αn and βn.
The overlapping reading score calculation unit 131 calculates an overlapping reading score S21 which is a score indicating the amount of overlapping of the candidate character string CS. The amount of overlapping of the candidate character string CS is specifically an amount by which regions corresponding to the characters C included in the candidate character string CS overlap each other. The second score calculation unit 130A calculates the second score S2 on the basis of the calculated overlapping reading score S21. That is, in the present embodiment, the second score calculation unit 130A calculates the second score S2 on the basis of an amount by which regions corresponding to the characters C included in the candidate character string CS overlap each other.
” is written in the input image IM, and the character region estimation unit 120 estimates a character region CA1, a character region CA2, and a character region CA3 as the character region CA. Here, an overlapping region CA-DP which is a region where the character region CA2 and the character region CA3 overlap each other is the amount of overlapping. Specifically, in a case where the amount of overlapping is m(y), the overlapping reading score calculation unit 131 calculates the following Expression (1) as an overlapping consistency score Povlp.
Here, COP is a constant between 0 and 1, and as this value decreases, the overlapping consistency score Povlp becomes small. The value of Cop may be obtained experimentally.
Referring back to
The character-likeliness map generation unit 1321 generates a character-likeliness map CLM. The character-likeliness map CLM indicates the likelihood that some character C exists in the image region of the input image IM.
The skipping score integration unit 1322 calculates the skipping score S22 on the basis of the character region CA estimated by the character region estimation unit 120 and the character-likeliness map CLM generated by the character-likeliness map generation unit 1321.
” in “
” which is the character string S included in the input image IM is estimated as the character region CA1, and the character “
” is estimated as the character region CA2. The character “
” is not estimated as the character region CA. That is, skipping has occurred.
Here, a region where there is a high probability that some character C exists in the image region of the input image IM and it is included in the character-likeliness map CLM after being filtered is denoted by Uj(y). In a case where the image region of the input image IM is divided into a width W and a height H, the skipping score integration unit 1322 calculates the following Expression (2) as a skipping consistency score PSKIP(y). Meanwhile, the image region of the input image IM may be divided into pixel units of the input image IM, or may be divided into units of a predetermined range constituted by a plurality of pixels.
Here, CSP is a constant equal to or greater than 0, and as CSP becomes large, the skipping consistency score PSKIP decreases. The value of CSP may be obtained experimentally. Meanwhile, in a case where no skipping penalty is imposed, CSP may be set to 0.
Referring back to
The character recognition score calculation unit 111 calculates a character recognition score S11 for each candidate character string CS. The character recognition score S11 indicates the likelihood of a character string.
The knowledge processing score calculation unit 112 calculates a knowledge processing score S12 for each candidate character string CS. The knowledge processing score calculation unit 112 is used in a case where the number of candidate character strings CS to be written in the input image IM is limited. The case where the number of candidate character strings CS to written in the input image IM is limited is, for example, a case where information that the input image IM is a zip code, an address, a name, and the like is obtained in advance. In a case where it is known that the input image IM is a zip code, and a case where the candidate character string CS is not a numeral, the knowledge processing score S12 is calculated to be low. In addition, in a case where it is known that the input image IM is an address, the knowledge processing score S12 is calculated to be low for “” than for “
.”
The first score integration unit 113 calculates the first score S1 on the basis of the character recognition score S11 calculated by the character recognition score calculation unit 111 and the knowledge processing score S12 calculated by the knowledge processing score calculation unit 112. The selection unit 140 selects the selection character string SS on the basis of the calculated first score S1 and second score S2.
Here, an example in which the selection unit 140 selects the selection character string SS on the basis of the character recognition score S11, the knowledge processing score S12, the overlapping reading score S21, and the skipping score S22 will be described. In this case, the selection unit 140 selects the selection character string SS on the basis of the following Expression (3).
Specifically, the selection unit 140 selects, as the selection character string SS, the candidate character string CS that has the maximum value obtained by multiplying POCR which is the character recognition score S11, PLM which is the knowledge processing score S12, Povlp which is the overlapping reading score S21, and Pskip which is the skipping score S22.
According to the above-described embodiment, the character recognition device 10 includes the first score calculation unit 110 to calculate the first score S1 indicating the likelihood of the character string S for each candidate character string CS, includes the character region estimation unit 120 to estimate a region for each character C included in the character string S, includes the second score calculation unit 130 to calculate the second score S2 indicating the consistency of the character C, and includes the selection unit 140 to select the selection character string SS on the basis of the first score S1 and the second score S2. That is, according to the above-described embodiment, the maximum likelihood character string is selected in consideration of the consistency of the region where the character C exists. Therefore, the character recognition device 10 can correctly recognize the character C included in the input image IM.
In addition, according to the above-described embodiment, the second score calculation unit 130 calculates the second score S2 on the basis of the overlapping reading score S21. The overlapping reading score S21 is a score according to an amount by which regions corresponding to the characters C included in the candidate character string CS overlap each other. Therefore, according to the present embodiment, the character recognition device 10 can prevent overlapping reading, and can therefore correctly recognize the character C included in the input image IM.
In addition, according to the above-described embodiment, the second score calculation unit 130 calculates the second score S2 on the basis of the skipping score S22. The skipping score S22 is a score based on the character C included in the candidate character string CS and the character region CA estimated by the character region estimation unit 120, and in a case where skipping has occurred, a larger penalty given. Therefore, according to the present embodiment, the character recognition device 10 can prevent skipping, and can therefore correctly recognize the character C included in the input image IM.
Here, according to the prior art, there is a trade-off between the improvement of overlapping reading and the improvement of skipping, improving one of them makes the other problem more likely to occur. According to the above-described embodiment, the overlapping reading score S21 and the skipping score S22 are calculated separately, and the selection character string SS is selected comprehensively, so that problems of both overlapping reading and skipping can also be improved.
In addition, according to the above-described embodiment, the second score calculation unit 130 calculates the skipping score S22 by using the character-likeliness map CLM. The character-likeliness map CLM indicates the likelihood that some character C exists in the region of the input image IM. According to the present embodiment, skipping can be easily prevented.
An example of a character recognition device 10A according to a second embodiment will be described with reference to
Specifically, the character recognition device 10A first selects one or more selection character strings SS for the partial input image IMP-1 in the input image IM. Next, the character recognition device 10A selects one or more selection character strings SS for the partial input image IMP-1 and the partial input image IMP-2. In this case, since one or more selection character strings SS have already been selected for the partial input image IMP-1, the number of candidate character strings CS for the partial input image IMP-1 and the partial input image IMP-2 is reduced. Further, the character recognition device 10A selects a final selection character string SS for the partial input image IMP-1, the partial input image IMP-2, and the partial input image IMP-3. In this case, since one or more selection character strings SS have already been selected for the partial input image IMP-1 and the partial input image IMP-2, the number of candidate character strings CS for the partial input image IMP-1, the partial input image IMP-2, and the partial input image IMP-3 is reduced. In this way, in the present embodiment, the overall processing time is shortened by narrowing down the candidate character strings S for each partial input image IMP.
” is written in the input image IM will be described with reference to the drawing.
In ” as a candidate character string CS-11, “
1” as a candidate character string CS-12, and “
” as a candidate character string CS-13. The consistency scores of the candidate character strings CS are “1.0,” “0.3,” and “1.0.” The character recognition device 10A selects the candidate character string CS-11 which is a plausible character string and the candidate character string CS-13 as the selection character strings SS. In other words, the character recognition device 10A excludes a candidate character string C-12 from the candidates.
In ” as a candidate character string CS-21, “
” as a candidate character string CS-22, “
” as a candidate character string CS-23, “
” as a candidate character string CS-24, “
” as a candidate character string CS-25, and “
” as a candidate character string CS-26. The consistency scores of the candidate character strings CS are “0.1,” “1.0,” “0.1,” “1.0,” “1.0,” and “1.0.” The character recognition device 10A selects the candidate character string CS-22, the candidate character string CS-24, the candidate character string CS-25, and the candidate character string CS-26 which are plausible character strings as the selection character strings SS. In other words, the character recognition device 10A excludes the candidate character string C-21 and the candidate character string CS-23 from the candidates. Here, in the examination of the partial input image IMP-1, “
1” which is the candidate character string CS-12 is excluded from the candidates, and thus in the examination of the partial input image IMP-1 and the partial input image IMP-2, the number of candidate character strings S can be reduced.
(Step S210) The character recognition device 10A sets x to δ. Here, δ is a predetermined integer indicating the range of the partial input image IMP. In addition, x indicates a range in which the character recognition device 10A recognizes characters. In this flowchart, the character recognition device 10A first calculates the candidate character string CS for a range from 0 to x. Here, x that is a range in which the character recognition device 10A recognizes characters is equivalent to the partial input image IMP in an example described with reference to
(Step S220) The character recognition device 10A sets an empty set to a candidate set Φ.
(Step S230) The first score calculation unit 110 included in the character recognition device 10A calculates the first score S1 for each of the candidate character strings CS among the plurality of candidate character strings CS included in the partial input image IMP. That is, in a case where the candidate character string CS is yn and the first score S1 is an, the first score calculation unit 110 calculates (y1, α1) . . . (yn, αn).
(Step S240) The character recognition device 10A selects the selection character string SS in the partial input image IMP. Specifically, the character recognition device 10A selects R pairs of yi and γi, where γi is large, and sets these pairs as the candidate set Φ. R is the number of character strings to be candidates in a case where character recognition is performed on the next partial input image IMP. If R is made to be small, the processing time can be shortened, but if it is too small, the possibility of erroneous recognition may increase.
(Step S250) The character recognition device 10A determines whether character recognition has been performed on all of the input images IM. Specifically, in a case where x is smaller than W, the character recognition device 10A advances the process to step S270. In a case where x is not smaller than W, the character recognition device 10A advances the process to step S260.
(Step S270) The character recognition device 10A extends the range in which character recognition is performed. Specifically, the character recognition device 10A sets the value obtained by adding δ to x as x, and advances the process to step S230.
(Step S260) The character recognition device 10A outputs the character string yk for which γk becomes maximum as the selection character string SS.
According to the above-described embodiment, the first score calculation unit 110 included in the character recognition device 10A calculates the first score S1 for the partial input image IMP which is a portion of the input image IM. In other words, the first score calculation unit 110 calculates the first score S1 for the candidate character string CS which is a candidate for the character string S including some characters among a plurality of characters C that constitute the character string S included in the input image IM. In addition, the second score calculation unit 130 included in the character recognition device 10A calculates the second score S2 for the partial input image IMP which is a portion of the input image IM. In other words, the second score calculation unit 130 calculates the second score S2 for the candidate character string CS which is a candidate for the character string S including some characters among a plurality of characters C that constitute the character string S included in the input image IM. Since the character recognition device 10A calculates the candidate character string CS for each portion of the input image IM, it is possible to reduce the number of candidates for the entire character string included in the input image IM. Thus, according to the present embodiment, it is possible to perform character recognition with less time and resources by using a beam search algorithm.
An example of a character recognition device 10B according to a third embodiment will be described with reference to
The input image IM shown in
The character C is written in the input image IM shown in ,” the character C-2 is “
,” and the character C-3 is “
.”
” and “
.”
In a case where the candidate character string CS is “,” the character region CA1-1 is included in the character input region IAR1, the character region CA2-1 and the character region CA3-1 are included in the character input region IAR2, and a character region CA4-1 is included in the character input region IAR3. In this case, since the character region CA2-1 and the character region CA3-1 are included in the character input region IAR2, two character regions CA exist in one frame (character input region IAR). In this case, in a case where a plurality of character regions CA exist in one frame, the second score calculation unit 130 calculates a consistency score PBOX on the basis of the following Expression (4) with the amount of overlapping between the smaller character region CA and the frame region as m(y).
Here, CBP is a constant between 0 and 1, and as this value decreases, the consistency score PBOX becomes small. The value of CBP may be obtained experimentally.
Here, an example of a case in which the selection unit 140 further selects the selection character string SS on the basis of the consistency score PBOX will be described. In this case, the selection unit 140 selects the selection character string SS on the basis of the following Expression (5).
That is, in the present embodiment, the second score calculation unit 130 calculates the second score S2 on the basis of the character region CA which is a region corresponding to each character C included in the candidate character string CS and the character input region IAR.
According to the above-described embodiment, the character recognition device 10B includes the second score calculation unit 130 to calculate the second score S2 on the basis of the character region CA and the character input region IAR. For example, in a case where a plurality of characters are included in one frame, the second score calculation unit 130 calculates a low value for the second score S2. The second score calculation unit 130 calculates a low value for the second score S2 in a case where a plurality of characters are included in one frame, which makes it possible to prevent erroneous recognition such as recognizing characters by dividing them into left-hand sides, radicals, and the like of kanji. Therefore, according to the present embodiment, the character C included in the input image IM can be recognized more correctly.
A fourth embodiment will be described with reference to
” is handwritten horizontally from left to right will be described.
The neural network NN1 calculates a series of feature amounts F of an input character string. In a case where the input data DI is an image of a character string handwritten horizontally from left to right, the neural network NN1 recognizes a series of feature amounts F from left to right by the width of the determination range. In this example, the neural network NN1 calculates feature amounts from a feature amount F1 to a feature amount F6. Here, the neural network NN1 calculates the feature amount F in a number corresponding to the length of the rows of the input data DI.
A neural network NN2 calculates a probability distribution P for each feature amount F calculated by the neural network NN1. In this example, the neural network NN1 calculates feature amounts from the feature amount F1 to the feature amount F6, and thus the neural network NN2 calculates a probability distribution P1 corresponding to the feature amount F1 to a probability distribution P6 corresponding to the feature amount F6.
A connectionist temporal classification (CTC) 80 integrates each of the calculated probability distributions, calculates the probability distribution P of a character string corresponding to the input data DI, and outputs a character string recognized from the calculated probability distribution P as output data DO.
An estimation unit 85 acquires the feature amount F calculated by the neural network NN1. The estimation unit 85 estimates a range in which an element to be assigned a predetermined label can exist from the acquired feature amount F using a neural network NN3.
The estimation unit 85 associates each label of the output data DO recognized by the CTC 80 with each of the feature amounts F. In a case where one label in a sequence of labels of the output data DO is associated with a plurality of feature amounts F, the estimation unit 85 integrates and outputs the range estimated from the plurality of feature amounts F associated with the one label. The output result output by the estimation unit 85 specifies the range of each label in the input data DI. In an example shown in the drawing, a range A1 specifies the range of “,” a range A2 specifies the range of “
,” and a range A3 specifies the range of “
.”
First, the input data DI is input to a neural network NN4. In this example, an example of a case in which the input data DI is an image of a character string will be described. Specifically, an example of a case where the input data DI is an image of a character string in which “” is handwritten horizontally from left to right will be described. The neural network NN4 is a detection deep neural network (DNN). The neural network NN4 receives an image as an input and outputs a plurality of candidate rectangles R and the score of a character corresponding to each of the candidate rectangles R.
Specifically, the neural network NN4 outputs a candidate rectangle R1, a candidate rectangle R2, a candidate rectangle R3, a candidate rectangle R4, a candidate rectangle R5, a candidate rectangle R6, and the score of a character corresponding to each of the candidate rectangles R. More specifically, the neural network outputs a score of “0.8” for the character “” and a score of “0.1” for the character “
” corresponding to the candidate rectangle R1, a score of “0.5” for the character “
” and a score of “0.2” for the character “
” corresponding to the candidate rectangle R2, a score of “0.3” for the character “1” and a score of “0.1” for the character “
” corresponding to the candidate rectangle R3, a score of “0.8” for the character “
” and a score of “0.1” for the character “
” corresponding to the candidate rectangle R4, a score of “0.5” for the character “
” and a score of “0.1” for the character “
” corresponding to the candidate rectangle R5, and a score of “0.7” for the character “
” and a score of “0.1” for the character “
” corresponding to the candidate rectangle R6.
According to the above-described embodiment, the character region estimation unit 120 includes the estimation unit 85 to estimate a region in which a character C can exist on the basis of the feature amount F acquired from the input data DI, associate the character C with at least one of the plurality of feature amounts F, and specify the region corresponding to each character C by integrating a plurality of ranges associated with one label. By using the present embodiment, it is possible to perform an efficient search using a beam search algorithm. In addition, the estimation of a character region according to the present embodiment can be easily implemented.
In addition, according to the above-described embodiment, the character region estimation unit 120 receives an image as an input and outputs a plurality of candidate rectangles R and the score of a character corresponding to each of the candidate rectangles R. By using the present embodiment, it is possible to estimate a character region with fewer resources.
A fifth embodiment will be described with reference to
Alternatively, the character-likeliness map CLM is obtained by dividing the input image IM into small regions having a grid shape, and may be based on the total number of black pixels for each small region.
The character-likeliness calculation neural network DNN is a neural network that has been trained in advance so as to predict character likeliness.
In an example shown in
According to the above-described embodiment, the character-likeliness map generation unit 1321 can easily generate the character-likeliness map CLM by using the character-likeliness map CLM1, the character-likeliness map CLM2, or the character-likeliness map CLM3.
In addition, according to the above-described embodiment, the character-likeliness map generation unit 1321 includes the character-likeliness calculation neural network DNN, and can therefore generate the character-likeliness map CLM through machine learning. According to the above-described embodiment, the use of machine learning makes it possible to be resistant to noise and to prevent erroneous recognition. In addition, according to the above-described embodiment, the use of machine learning also makes it possible to correctly recognize the input image IM with a different background.
As described above, in the embodiment, a plurality of modification examples have been described. Here, a plurality of embodiments and a plurality of modification examples may be combined and implemented insofar as it is possible to combine them.
Meanwhile, the functions of the information processing device in the above-described embodiments may be realized by a computer. In that case, these functions may be realized by a program for realizing the functions being recorded in a computer readable recording medium, and the program recorded in this recording medium being read into a computer system and executed. Meanwhile, the term “computer system” referred to here is assumed to include an OS and hardware such as peripheral devices. In addition, the term “computer readable recording medium” refers to a flexible disk, a magneto-optic disc, a ROM, a CD-ROM, a DVD-ROM, a portable medium such as a USB memory, and a storage device such as a hard disk built into a computer system. Further, the “computer readable recording medium” is assumed to include recording mediums that dynamically hold a program during a short period of time like networks such as the Internet or communication lines when a program is transmitted through communication lines such as a telephone line, and recording mediums that hold a program for a certain period of time like a volatile memory inside a computer system serving as a server or a client in that case. In addition, the above-mentioned program may be a program which is used for realizing a portion of the aforementioned functions, and may be a program which is capable of realizing the aforementioned functions by a combination of programs previously recorded in the computer system.
According to at least one of the embodiments described above, a first score calculation unit, a character region estimation unit, a second score calculation unit, and a selection unit are included, and thus it is possible to correctly recognize characters included in an input image.
While several embodiments of the present invention have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the invention. Indeed, these embodiments described herein may be embodied in a variety of other forms, and furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the present invention. The appended claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present invention.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/027500 | Jul 2022 | WO |
Child | 19014383 | US |