Claims
- 1. A computer-readable medium having stored thereon a data structure comprising:
data representative of a page sequence resulting from a batch scanning process of at least two documents; and data representative of at least one feature attribute for each page of the page sequence and having a one-to-one association with each page of the page sequence.
- 2. The computer-readable medium of claim 1, further comprising data representative of a document break between the at least two documents when corresponding data representative of the at least one feature attribute for one page is sufficiently dissimilar to corresponding data representative of the at least one feature attribute for an adjacent page.
- 3. The computer-readable medium of claim 2, wherein the at least one feature attribute is two or more feature attributes.
- 4. The computer-readable medium of claim 1, wherein each page of the page sequence is separately distinguishable from each other page.
- 5. The computer-readable medium of claim 1, wherein the at least one feature attribute comprises any of: at least one specific layout attribute, at least one general layout attribute, at least one textual attribute, and at least one image attribute.
- 6. The computer-readable medium of claim 5, wherein the at least one specific layout attribute comprises any of: page numbers, headings, page headers, page footers, and captions.
- 7. The computer-readable medium of claim 5, wherein the at least one general layout attribute comprises any of: block locations, block dimensions, block line statistics, and block text statistics.
- 8. The computer-readable medium of claim 5, wherein the at least one textual attribute comprises any of: word statistics and textual continuity.
- 9. The computer-readable medium of claim 5, wherein the at least one image attribute comprises any of: color, intensity histograms, wavelet coefficients, and connected component statistics.
- 10. A method for discriminating between documents within a plurality of scanned pages resulting from a batch scanning process of at least two documents, the method comprising steps of:
receiving data representative of scanned pages of at least two documents, absent modification to the scanned pages; determining, for each scanned page of the plurality of scanned pages, at least one feature attribute; comparing the at least one feature attribute for a current scanned page of the plurality of scanned pages with a corresponding at least one feature attribute for at least one previous scanned page of the plurality of scanned pages; and determining existence of a document break between the current scanned page and a least distant scanned page of the at least one previous scanned page.
- 11. The method of claim 10, wherein the step of determining existence of a document break occurs when the step of comparing indicates that the current scanned page and the least distant scanned page are sufficiently dissimilar.
- 12. The method of claim 10, wherein the step of determining existence of a document break occurs when the step of comparing indicates that the current scanned page and the least distant scanned page are sufficiently dissimilar.
- 13. The method of claim 10, wherein the step of determining existence of a document break occurs when the step of comparing fails to indicate that the current scanned page and the least distant scanned page are sufficiently similar.
- 14. The method of claim 10, wherein the step of determining existence of a document break occurs when the step of comparing fails to indicate that the current scanned page and the least distant scanned page are sufficiently similar.
- 15. The method of claim 10, wherein the step of determining existence of a document break occurs when the step of comparing indicates that the current scanned page and the least distant scanned page are more dissimilar than similar.
Parent Case Info
[0001] This application is a continuation of and claims priority from allowed application Ser. No. 09/583,049, filed May 30, 2000, issuing as U.S. Pat. No. 6,735,335, the content of which is herein incorporated by reference in its entirety.
Continuations (1)
|
Number |
Date |
Country |
Parent |
09583049 |
May 2000 |
US |
Child |
10840617 |
May 2004 |
US |