Claims
- 1. A method for automatically determining one or more languages associated with text in a bit-mapped image, comprising the steps of:
segmenting the image into a plurality of images of word token, recognition of separate characters in said images of word token, joining separate characters into groups presumably comprising words, forming at least one hypothesis about correspondence of the characters group, presumably comprising a word, to a certain language, accepting the hypothesis about correspondence of the characters group, presumably comprising a word, to a certain language; the said step of forming a hypothesis about correspondence of the characters group, presumably comprising a word, to a certain language, further comprises at least the following steps definition of selected language models set, estimation of word correspondence with lingual and non-lingual models.
- 2. The method of claim 1, wherein the step of recognition of separate characters in said images of word token is performed by a classifier, that is generic to each of said plural languages.
- 3. The method of claim 1, wherein the step of accepting the hypothesis about correspondence of the characters group, presumably comprising a word, to a certain language further comprises
defining a set of dictionaries for the estimation of the word correspondence to a certain language, estimation of the word correspondence with defined dictionaries.
- 4. The method of claim 3, wherein the defining of a set of dictionaries for the estimation of language correspondence of the text is made manually.
- 5. The method of claim 3, wherein the defining of a set of dictionaries for the estimation of language correspondence of the text is made automatically.
- 6. The method of claim 1, wherein the step of accepting the hypothesis about correspondence of the characters group, presumably comprising a word, to a certain language further comprises a calculation of complex estimation, said complex estimation including at least
character recognition quality estimation, dictionary conformity estimation, including language models conformity estimation.
- 7. The method of claim 6, wherein complex estimation further comprises calculation of a special factor of characters mutual correspondence.
- 8. The method of claim 6, wherein complex estimation further comprises calculation of a special factor of words relative placement.
- 9. The method of claim 7, wherein complex estimation further comprises a special factor of words correspondence calculation.
- 10. The method of claim 9, wherein the special factor comprises geometric conformity of characters within the word.
- 11. The method of claim 9, wherein the special factor comprises geometric conformity of characters within the line.
- 12. The method of claim 9, wherein the special factor comprises a linguistic correspondence of word with neighbors,
- 13. The method of claim 9, wherein the special factor includes accuracy estimation of a word reconstruction from token image, and also in the presence of distortion.
Priority Claims (1)
Number |
Date |
Country |
Kind |
2002127826 A |
Jul 2002 |
RU |
|
Parent Case Info
[0001]1References CitedU.S. Pat. Documents3988715October, 1976Mullan et al.382/228.4829580May, 1989Church704/9.5062143October, 1991Schmitt704/9.5182708January, 1993Ejiri704/9.5371807December, 1994Register et al.704/9.5418951May, 1995Damashek704/9.5548507August, 1996Martino et al.704/9.6047251Apr. 4, 2000Pon et al.382/2296,370,269Apr. 9, 2002Al-Karmi et al.382/197