Claims
- 1. A document processing system for use in identifying a segmented document, comprising:
a data store of layout graph models that are at least one of classified and labeled; a matching module operable to make a determination of a match between a layout graph sample for the segmented document and a particular layout graph model of said data store, wherein said matching module has a correlator generating an identified, segmented document that is at least one of classified and labeled based on the segmented document, the layout graph model, and the determination of a match.
- 2. The system of claim 1, wherein said matching module is operable to generate a node map useful for matching nodes of the particular layout graph model to nodes of the layout graph sample.
- 3. The system of claim 1, wherein said correlator is operable to assign labels of labeled nodes of the layout graph model to segments of the segmented document, wherein the segments relate to nodes of the layout graph sample that match the labeled nodes having the labels.
- 4. The system of claim 1, wherein said correlator is operable to assign a classification of the layout graph model to the segmented document based on the determination of a match.
- 5. The system of claim 1, further comprising a document segmentation engine operable to segment a document, thereby generating the segmented document.
- 6. The system of claim 1, further comprising a layout graphing module operable to build the layout graph sample based on the segmented document.
- 7. The system of claim 1, further comprising a verification module operable to perform an evaluation relating to accuracy of at least one of classification and labeling of the identified, segmented document, and to improve at least one layout graph model of said data store based on the evaluation.
- 8. The system of claim 1, wherein the layout graph models are comprised of nodes and edges, wherein the nodes represent document segments relating to a class of documents, and the edges are based on observed spatial inter-relation of the document segments.
- 9. The system of claim 1, wherein said data store of layout graph models has a hierarchical organization with layout graph models representing document subclasses that are subordinate to a specific document class related to a specific layout graph model representing the specific document class in a subordinate fashion, and wherein said matching module is operable to successively attempt matches between the layout graph sample and multiple layout graph models based on the hierarchical organization.
- 10. A method of classifying and labeling a segmented document, comprising:
receiving a layout graph sample for the segmented document; making a determination of a match between the layout graph sample and a layout graph model that is at least one of classified and labeled; and generating an identified, segmented document that is at least one of classified and labeled based on the segmented document, the layout graph model, and the determination of a match.
- 11. The method of claim 10, wherein said segmented document corresponds to an unclassified, unlabeled, segmented document, and said receiving a layout graph sample corresponds to receiving an unclassified, unlabeled layout graph sample.
- 12. The method of claim 10, wherein said generating an identified, segmented document includes:
(a) assigning a classification of the layout graph model to the segmented document based on the determination of a match; and (b) assigning labels of labeled nodes of the layout graph model to segments of the segmented document, wherein the segments relate to nodes of the layout graph sample that match the labeled nodes having the labels.
- 13. The method of claim 10, wherein the segmented document corresponds to an unlabeled, segmented document.
- 14. The method of claim 10, wherein the segmented document is at least one of pre-classified and pre-labeled, and wherein said generating a classified, labeled, segmented document at least one of re-classifies, re-labels, further classifies, and further labels the segmented document.
- 15. The method of claim 10, wherein said generating an identified, segmented document includes assigning labels of labeled nodes of the labeled, layout graph model to segments of the segmented document, wherein the segments relate to nodes of the layout graph sample that match the labeled nodes having the labels.
- 16. The method of claim 10, wherein said generating a classified, labeled, segmented document includes assigning a classification of the layout graph model to the segmented document based on the determination of a match.
- 17. The method of claim 10, comprising segmenting a document, thereby generating a segmented document.
- 18. The method of claim 10, wherein said receiving a layout graph sample includes building the layout graph sample based on the segmented document.
- 19. The method of claim 10, wherein said making a determination of a match between the layout graph sample and a layout graph model includes:
(a) accessing a data store of layout graph models having a hierarchical organization, wherein with layout graph models representing document subclasses that are subordinate to a specific document class related to a specific layout graph model representing the specific document class in a subordinate fashion; and (b) successively attempting matches between the layout graph sample and multiple layout graph models based on the hierarchical organization.
- 20. A method of building a labeled, layout graph model for a class of documents, comprising:
receiving segmentation results of at least one segmentation of at least one document of the class of documents; instantiating nodes to represent document segments of a page for the class of documents based on the segmentation results, wherein the nodes store information identifying characteristics of the represented document segments; and instantiating edges relating nodes to one another based on the segmentation results, wherein the edges store information identifying spatial inter-relation of the document segments represented by the nodes.
- 21. The method of claim 20, comprising labeling the nodes based on predefined categories for content of corresponding document segments for the class of documents.
- 22. The method of claim 21, further comprising:
using the layout graph model to accomplish assignment of labels to new document segments of a new segmented document; making a verification of assignment of labels to the new document segments; and improving the labeled, layout graph model based on the verification of assignment of labels.
- 23. The method of claim 20, comprising classifying the layout graph model based on the class of documents.
- 24. The method of claim 20, further comprising:
using the layout graph model to perform a classification associating a new, segmented document with the class of documents; making a verification of the classification of the new, segmented document; and improving the layout graph model based on the verification of the classification.
- 25. The method of claim 20, wherein said receiving segmentation results includes segmenting at least one document of the class of documents, thereby generating the segmentation results.
- 26. The method of claim 20, wherein said receiving segmentation results includes observing segmentation results of at least one segmentation of at least one document of the class of documents.
- 27. A method of making a match between layout graph models for use with classifying and labeling documents, comprising:
receiving a layout graph sample; comparing the layout graph sample to at least one layout graph model that is at least one of classified and labeled; and finding a best match between the layout graph sample and a particular layout graph model.
- 28. The method of claim 27, wherein said finding a best match comprises:
making a best one-to-one match between the layout graph sample and the particular layout graph model; identifying unmatched nodes; and matching the unmatched nodes independently of one another but with reference to the best one-to-one match.
- 29. The method of claim 27, wherein said making a best match includes mapping nodes from the layout graph sample to nodes of the layout graph model.
- 30. The method of claim 29, wherein said making a best match includes computing a cost for a pair of mapped nodes, wherein the cost is defined as a sum of differences between corresponding node attributes, wherein the sum is weighed by weight factors of a node of the layout graph model, wherein the node is a member of the pair of mapped nodes.
- 31. The method of claim 29, wherein said making a best match includes computing a cost for a pair of mapped edges, wherein the cost is defined as a sum of differences between corresponding edge attributes, wherein the sum is weighed by weight factors of an edge of the layout graph model, wherein the edge is a member of the pair of mapped edges.
- 32. The method of claim 29, wherein said making a best match includes computing a sum of node pair costs and edge pair costs, wherein a mapping of minimal cost is defined as the best match.
- 33. The method of claim 29, wherein said making a determination of a match between the layout graph sample and a layout graph model includes:
(a) accessing a data store of layout graph models having a hierarchical organization, wherein with layout graph models representing document subclasses that are subordinate to a specific document class related to a specific layout graph model representing the specific document class in a subordinate fashion; and (b) successively attempting matches between the layout graph sample and multiple layout graph models based on the hierarchical organization.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/337,073, filed on Dec. 4, 2001. The disclosure of the above application is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60337073 |
Dec 2001 |
US |