Claims
- 1. A method for sorting document images stored in a memory of a document management system, comprising the steps of:
segmenting each document image recorded in the memory into a set of layout objects; each layout object in each of the sets of layout objects being one of a plurality of layout object types; each of the plurality of layout object types identifying a structural element of a document; selecting a feature of a document from a set of features; each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory; assembling in the memory a set of image segments; each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and sorting the assembled image segments into clusters in the memory; each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
- 2. The method according to claim 1, wherein said selecting step selects a plurality of features from the set of features.
- 3. The method according to claim 1, further comprising the step of selecting a subset of document images recorded in the memory.
- 4. The method according to claim 3, wherein said assembling step assembles the set of image segments by identifying those layout objects of each document image in the selected subset of document images.
- 5. The method according to claim 1, wherein said sorting step further comprises the steps of:
selecting a first image segment from the set of image segment; computing a distance measurement between the first image segment and image segments remaining in the set of image segments; and defining a first cluster with the first image segment and certain of the remaining image segments having a distance measurement that is within a threshold distance.
- 6. The method according to claim 1, wherein said sorting step comprises the steps of:
selecting a document image from the memory; assembling a single image segment by identifying those layout objects of the selected document image that form the selected feature; computing a distance measurement between the single image segment and each image segment in the set of image segments; and forming clusters of document images by ranking the distance measurement of each image segment in the set of image segments.
- 7. The method according to claim 1, further comprising the step of displaying the assembled image segments in the clusters sorted by said sorting step.
- 8. The method according to claim 1, further comprising the step of computing attributes for each layout object in the set of layout objects; the computed attributes of each layout object having values that quantify properties of a structural element and identify spatial relationships with other segmented layout objects in the document image.
- 9. The method according to claim 8, further comprising the step of executing a routine for identifying a feature of the document image; the routine having a sequence of selection operations that consumes the set of layout objects and uses the computed attributes to produce a subset of layout objects; said executing step identifying the subset of layout objects as the feature of the document image.
- 10. The method according to claim 1, further comprising the step of defining a structural model for identifying a genre of document; wherein the structural model defines a class of document images which express a common communicative purpose that is independent of document content.
- 11. The method according to claim 1, further comprising the step of providing a user interface for selecting the feature.
- 12. The method according to claim 1, wherein said assembling step assembles more than one layout object to form the selected feature of a document image stored in the memory.
- 13. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for sorting document images stored in a memory of a document management system, said method steps comprising:
segmenting each document image recorded in the memory into a set of layout objects; each layout object in each of the sets of layout objects being one of a plurality of layout object types; each of the plurality of layout object types identifying a structural element of a document; selecting a feature of a document from a set of features; each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory; assembling in the memory a set of image segments; each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and sorting the assembled image segments into clusters in the memory; each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
- 14. The program storage device as recited in claim 13, wherein said sorting step of said method steps further comprises the steps of:
selecting a first image segment from the set of image segment; computing a distance measurement between the first image segment and image segments remaining in the set of image segments; and defining a first cluster with the first image segment and certain of the remaining image segments having a distance measurement that is within a threshold distance.
- 15. The program storage device as recited in claim 13, wherein said sorting step of said method steps further comprises the steps of:
selecting a document image from the memory; assembling a single image segment by identifying those layout objects of the selected document image that form the selected feature; computing a distance measurement between the single image segment and each image segment in the set of image segments; and forming clusters of document images by ranking the distance measurement of each image segment in the set of image segments.
- 16. A document management system for sorting document images, comprising:
a memory for storing the document images and image processing instructions of the document management system; and a processor coupled to the memory for executing the image processing instructions of the document management system; the processor in executing the image processing instructions:
segmenting each document image recorded in the memory into a set of layout objects; each layout object in each of the sets of layout objects being one of a plurality of layout object types; each of the plurality of layout object types identifying a structural element of a document; selecting a feature of a document from a set of features; each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory; assembling in the memory a set of image segments; each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and sorting the assembled image segments into clusters in the memory; each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
- 17. The document management system according to claim 16, further comprising a program interface for selecting the feature.
- 18. The document management system according to claim 16, wherein said program interface provides means for selecting a first feature and a second feature from the set of features.
- 19. The document management system according to claim 16, further comprising means for selecting a set of document images recorded in the memory.
- 20. The document management system according to claim 19, wherein said selecting assembles a set of image segments by identifying those layout objects of each document image in the selected set of document images.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] Cross-reference is made to U.S. patent application Ser. Nos. 08/AAA,AAA, entitled “System For Dynamically Specifying Layout Components Of Document Images ” (Attorney Docket No. D/97346), 08/AAA,AAA, entitled “System For Summarizing A Corpus Of Documents By Assembling User Specified Layout Components” (Attorney Docket No. D/97493), and 8/AAA,AAA, entitled “System For Progressively Transmitting And Displaying Layout Components Of Document Images” (Attorney Docket No. D/97495), which are assigned to the same assignee as the present invention.