The present invention relates to a character recognition technology, and particularly, to a character recognition apparatus and a character recognition method for recognizing characters in an image.
Character recognition technology is widely used in various fields of common, everyday life, including the recognition of characters in still images and in dynamic images (video images). One kind of video images, lecture video, is commonly used in e-Learning, and other educational and training environments. In a typical lecture video, a presenter uses a slide image as the background while he or she speaks. There is usually a great amount of text information in the lecture videos, which are very useful for content generation, indexing, and searching.
The recognition performance for characters in lecture video is rather low because the character images to be recognized are usually blurred and have small sizes, whereas the dictionary used in recognition is obtained from original clean character images.
In the prior art, the recognition for characters in lecture videos is the same as the recognition for characters in a scanned document. The characters are segmented and then recognized using a dictionary made from original clean characters.
There are many papers and patents about synthetic character image generation, such as:
P. Sarkar, G. Nagy, J. Zhou, and D. Lopresti. Spatial sampling of printed patterns. IEEE PAMI, 20 (3): 344-351, 1998
E. H. Barney Smith, X. H. Qiu, Relating statistical image differences and degradation features. LNCS 2423: 1-12, 2002
T. Kanungo, R. M. Haralick, I. Philips. “Global and Local Document Degradation Models,” Proceedings of IAPR 2nd International Conference on Document Analysis and Recognition, Tsukuba, Japan, 1993 pp. 730-734
H. S. Baird, “Generation and use of defective images in image analysis”. U.S. Pat. No. 5,796,410.
However, there is no report on video character recognition using synthetic pattern by far.
Arai Tsunekazu, Takasu Eiji and Yoshii Hiroto once published a patent entitled “Pattern recognition apparatus which compares input pattern features and size data to registered feature and size pattern data, an apparatus for registering feature and size data, and corresponding methods and memory media therefore” (U.S. Pat. No. 6,421,461). In this patent, the inventors also extracted the size information of the testing characters, but they used this information to compare with the size information in a dictionary.
Therefore, there is a need to make improvement over the prior art to improve the recognition performance for characters.
It is one object of the present invention to solve the problems pending in the prior art, namely to improve the recognition performance for characters while recognizing characters in an image.
According to the present invention, there is provided a character recognition apparatus for recognizing characters in an image, comprising:
a text line extraction unit for extracting a plurality of text lines from an input image;
a feature recognition unit for recognizing one or more features of each of the text lines;
a synthetic pattern generation unit for generating synthetic character images for each of the text lines by using the features recognized by the feature recognition unit and original character images;
a synthetic dictionary generation unit for generating a synthetic dictionary for each of the text lines by using the synthetic character images; and
a text line recognition unit for recognizing characters in each of the text lines by using the synthetic dictionary.
According to the present invention, there is further provided a character recognition method for recognizing characters in an image, comprising the steps of:
extracting text lines from an input image;
recognizing one or more features of each of the text lines;
generating synthetic character images for each of the text lines by using the recognized features and original character images;
generating a synthetic dictionary for each of the text lines by using the synthetic character images; and
recognizing characters in each of the text lines by using the synthetic dictionary.
In the present invention, by extracting beforehand certain features of the text to be recognized, and synthesizing these features with original character images to get synthetic characters and hence a synthetic dictionary, characters can be recognized by using a synthetic dictionary suitable for the text to be recognized. Consequently, the recognition performance for characters can be markedly improved.
In the present invention, a text frame extraction unit is first used to extract a video frame that contains text information. Then, a frame text recognition unit is used to recognize the character content in the frame, image. In the frame text recognition unit, a font type identification unit is used to identify the font types of the characters in the image frame. A text line extraction unit is used to extract all the text lines from each of the text frame images. A contrast estimation unit is used to estimate the contrast value from each of the text line images. A shrinking level estimation unit is used to estimate the number of the patterns generated for each of original patterns. And then, a synthetic pattern generation unit is used to generate a group of synthetic character patterns using the estimated font type and contrast information. These synthetic character images are used to make synthetic dictionaries for each of the text lines. Finally, a character recognition unit is used to recognize the characters in each of the text lines using the generated synthetic dictionaries.
For each of the detected text line, given the estimated font types and contrast value, a synthetic pattern generation unit 207 is used to generate a set of synthetic character images using a set of clean character pattern images. And then a synthetic dictionary generation unit 208 is used to generate a synthetic dictionary using the output of unit 207. After that, a text line recognition unit 209 is used to recognize the characters in the text line using the generated synthetic dictionary. A combination of the recognized text line contents of all text lines constitutes the text content 105 in
The specific method used in the text line extraction unit 201 can be referred from Jun Sun, Yutaka Katsuyama, Satoshi Naoi, “Text processing method for e-Learning videos”, IEEE CVPR workshop on Document Image Analysis and Retrieval, 2003.
where prjs(i) is the smoothed value for position i, δ is the window size for the smoothing operation, and j is the current position during the smoothing operation. In the smoothed histogram, the positions for the maximum value and the minimum value are recorded (S303, S304). Then the contrast value is calculated as the difference of the two positions (S305).
For a given text frame image, the recognition result for all the text lines in the image constitutes the recognition result of the content of this image. Finally, the combination of all the results in 105 constitutes the final output of the present invention, namely the recognition result of the lecture video.
It should be pointed out that, although the character recognition technology according to the present invention is explained above with reference to a lecture video image, the character recognition technology of the present invention is also applicable to other types of video images. Moreover, the character recognition technology of the present invention can likewise find application in such still images as scanned documents, photographs, and etc. Additionally, in the embodiments of the present invention, the features extracted from the text line to be recognized during the process of obtaining a synthetic dictionary are contrast, font and shrinking rate. However, the features extracted are not limited to one or more of these features, since it is also possible to additionally or alternatively extract other features of the text line.
Number | Date | Country | Kind |
---|---|---|---|
200410058334.0 | Aug 2004 | CN | national |