Structural description of a document, a method of describing the structure of graphical objects and methods of object recognition.

Information

  • Patent Application
  • 20070172130
  • Publication Number
    20070172130
  • Date Filed
    August 01, 2006
    18 years ago
  • Date Published
    July 26, 2007
    17 years ago
Abstract
The invention deals with the processing of machine-readable forms of non-fixed format. It comprises the structural description of characteristics of a document elements, a method of describing the logical structure of a document, methods of searching for elements of a document with the use of the structural description. A structural description of the spatial, parametric characteristics of document elements and the logical connections between elements comprises the hierarchical logical structure of the elements, specification of an algorithm of determining the search constraints, specification of every searched element characteristics, specification of the parameters set for a compound element identification on the basis of the aggregate of its components. The method of describing the logical structure of a document and methods of searching for elements of a document are based on the use of the structural description.
Description
Claims
  • 1. A structural description of the spatial, parametric characteristics of an element and logical connections thereof with other elements of a non-fixed layout document, comprising an assigned description of logical connections with other elements,an assigned description of spatial characteristics of the element,an assigned description of parametric characteristics of the element,an assigned algorithm of determining the elements search constraints,an assigned set of parameters for identification of a compound element on the basis of the aggregate of constituents,
  • 2. The structural description, as recited in claim 1, further comprising the setting of algorithm of estimating the quality of an obtained variant of an element.
  • 3. The structural description, as recited in claim 2, wherein the algorithm of estimating the quality of an obtained variant of an element is set in the form of a reference table.
  • 4. The structural description, as recited in claim 2, wherein the algorithm of estimating the quality of an obtained variant of an element is set in the form of a graph or formula.
  • 5. The structural description, as recited in claim 1, further optionally comprise specification of an auxiliary brief description for determination of the spatial orientation of the image.
  • 6. The structural description, as recited in claim 1, further optionally comprise specification of an auxiliary brief description to quickly select the type of the document and/or its comprehensive description from several preliminary specified thereof.
  • 7. A method of specifying the logical structure of the document, comprising: preliminarily specification of the list and description of all varieties of the elements which may be present on the form;creation of the structure of the elements logical connections;creation of the structure of the elements disposition;assignment of the structure as the disposition of simple and compound elements;assignment of the structure representation as the interrelations between simple and compound elements;assignment of the algorithm of specifying the search constraints of each element;specification of the set of at least the following characteristics for each simple and compound element search: the spatial characteristics of the search area;the parametric characteristics of the element,description of methods of the obtained elements identification, determination of the type of the element, determination of the distinctive properties of the each element type, and testing the completeness of the composition of parts of the compound element, said methods using the following information: values of the absolute spatial characteristics of the element and/orvalues of the relative spatial characteristics of the element;values of the parametric characteristics of the element;a rule of assigning quality ratings to obtained elements,description of a method of decreasing the number of variants of a compound element composition, and a method of accelerating the search for the best variant thereof.
  • 8. The method of specifying the logical structure of a document, as recited in claim 7, wherein the spatial characteristics of an element are included in the set of search characteristics thereof.
  • 9. The method of specifying the logical structure of a document, as recited in claim 7, wherein the spatial and parametric characteristics are represented as exact values.
  • 10. The method of specifying the logical structure of a document, as recited in claim 7, wherein the spatial and parametric characteristics are represented as intervals of values.
  • 11. The method of specifying the logical structure of a document, as recited in claim 7, wherein one or several earlier obtained objects, or one or several obtained lines, or one or several points, or one or several borders of the document are assigned as the reference points for the relative spatial characteristics.
  • 12. The method of specifying the logical structure of a document, as recited in claim 7, wherein the hierarchical structure of connections between the elements is set.
  • 13. The method of specifying the logical structure of a document, as recited in claim 7, wherein the method of decreasing the number of variants of the composition of a compound element and accelerating the search process further comprises: assigning a number of variants with the best quality estimates which will be kept for further analysis to each type of the element;performing a search for the best variant of a compound element, taking into account the best total quality of its accountable composite parts, regardless of their number.
  • 14. A method of searching for elements of form with the use of structural description, comprising at least the following steps: obtaining the structural description of the form;searching for objects on the image;allocating the obtained objects;revealing the text objects, to be mandatory recognized, and determining the minimal required scope of recognition;performing recognition of said text objects;performing the search for elements of the form, comprising at least the following steps: selecting a searched element in the structural description;gaining the algorithm of obtaining the search constraints from the structural description;performing the search of the element on the form image;examining of the obtained variants;optimizing the variants revision of the compound element components combinations,said search for an element comprises with the use of the following characteristics: spatial characteristics of the search area;parametric characteristics of the element;absolute and/or relative spatial characteristics of the element represented as exact values and/or as intervals of values;results of preliminary text recognition,said examination of the obtained variants of elements comprises the following steps: identifying the obtained variant of the element;estimating the quality of the identification of the element;analyzing the results of testing the hypotheses about the presence, completeness of composition, and types of composite parts, analyzing their correspondence to the hypothesis about the type in the case of a compound element;estimating the total reliability of the obtained variant,said optimizing the variants revision of the compound element components combinations, comprising: assigning a number of variants with the best quality ratings which will be kept for further analysis to each type of the element;discarding the other variants;searching for the best variant of the compound element, taking into account the best total quality of its accountable composite parts, regardless of their number;analyzing the quality estimates of earlier rejected variants in order to find quality estimates higher then the current best variant estimate.
  • 15. A method of searching for an element of the form of non-fixed layout using structural description, comprising at least the following steps (operations): searching for the object on the image;allocation of the found objects;determining types of the found objects;revealing the text objects, to be mandatory recognized, and determining the minimal required scope of recognition;recognizing said text objects;performing search for elements of the form comprising at least the following steps: selecting a searched element in the structural description;gaining the algorithm of obtaining the search constraints;searching for the element on the form image;examining of the obtained variants;optimizing the variants revision of the compound element components combinations,said searching for an element comprises the use of the following characteristics: the spatial characteristics of the search area;the parametric characteristics of the element;the spatial characteristics of the element,said examining of the obtained variants comprises the following actions:identifying the obtained elements;analyzing the results of testing the hypotheses about the presence and completeness of composition of the elements, and the types of the composite parts, analyzing the correspondence to the hypothesis about the composition of the compound element,said optimizing the variants revision of the compound element components combinations comprising:assigning a number of variants with the best quality ratings which will be kept for further analysis to each type of the element;searching for the best variant of the compound element, taking into account the best total quality of its accountable composite parts, regardless of their number,analyzing the quality estimates of earlier rejected variants in order to find quality estimates higher than the current estimate.
  • 16. The method of searching, as recited in claim 14 or 15, wherein the orientation of the image is determined.
  • 17. The method of searching, as recited in claim 16, wherein all or a part of elements of the structural description are used to determine the correct image orientation.
  • 18. The method of searching, as recited in claim 16, wherein an auxiliary brief description is optionally specified to determine the spatial orientation of the image.
  • 19. The method of searching, as recited in claim 16, wherein the image orientation resulting objects coincidence with the description thereof with the highest quality rating is accepted as the correct one.
  • 20. The method of searching, as recited in claim 14 or 15, wherein the type of a document is selected from several preliminary specified types.
  • 21. The method of searching, as recited in claim 20, wherein a supplementary brief structural description is optionally assigned for determining the document type and thus selecting the corresponding comprehensive document description from several preliminarily specified thereof.
  • 22. The method of searching, as recited in claim 21, wherein the type of the document which corresponds to the current image is selected on the basis of comparing the quality estimates of the coincidence with the preliminarily specified candidate descriptions.
  • 23. The method of searching, as recited in claims 14 or 15, wherein initially the first element in the list is selected.
  • 24. The method of searching, as recited in claims 14 or 15, wherein the applied spatial characteristics of an element comprises at least its absolute coordinates and/or relative coordinates.
  • 25. The method of searching, as recited in claims 14 or 15, wherein the exact and/or interval characteristics of an element are used.
  • 26. The method of searching, as recited in claims 14 or 15, wherein at least the following spatial characteristics of the search area are used: a half plane, a rectangle, a circle, a polygon, or a combination thereof.
  • 27. The method of searching, as recited in claims 14 or 15, wherein revision of variants of combinations of the elements is considered complete if the total quality estimate of the complete set of elements achieves the quality value of 1.
  • 28. The method of searching, as recited in claims 14 or 15, wherein one to three variants of a compound element which have the best quality estimate are used for further analysis.
  • 29. The method of searching, as recited in claims 14 or 15, wherein three to ten variants of a simple element which have the best quality estimate are used for further analysis.
  • 30. The method of searching, as recited in claims 14 or 15, wherein searching for the next element is performed if no variants for the current element found or the total quality rating is lower than the predefined level.
  • 31. The method of searching, as recited in claims 14 or 15, wherein if no objects are found in the region of the image which is specified therefore a further search is undertaken for an object corresponding to the next element of the structural description.
Priority Claims (1)
Number Date Country Kind
2006101908 A1 Jan 2006 RU national