Claims
- 1. A method for expanding whitespace in a document image having dimensions X, Y, comprising the steps of:
- obtaining a digital representation of said document image;
- identifying a set of rectangles that bound image data within the digital representation of said document image, said data including text and non text data, each of said rectangles having an original size; and
- mapping said set of rectangles to an area C.sub.1 X by C.sub.2 Y, wherein C.sub.1 and C.sub.2 are not both equal to one; and
- preserving said original size of each of said rectangles;
- wherein an output document image having changed whitespace between said image data in said X and Y directions, and unaltered image data size is created, said output document image distorted from said original document image by said value C.sub.1 along said X direction and by said value C.sub.2 along said Y direction.
- 2. A method for varying whitespace of an input document image comprising the steps of:
- obtaining a digital representation of said input document image;
- identifying a set of rectangles that bound image data within the input document image, said data including text and non text data, each of said rectangles having an original size;
- assigning a first set of coordinates (x.sub.0,y.sub.0) and (x.sub.n, Y.sub.m) to each one of said rectangles; and
- assigning each one of said rectangles to a second set of coordinates (x.sub.0.sbsb.3, Y.sub.0.sbsb.2) and (x.sub.n.sbsb.2, Y.sub.m.sbsb.2) wherein said second set of coordinates equals:
- x.sub.0.sbsb.2 =C.sub.1 x.sub.0 ;
- y.sub.0.sbsb.2 =C.sub.2 Y.sub.0 ;
- x.sub.n.sbsb.2 =C.sub.1 x.sub.0 +(x.sub.n -x.sub.0) and;
- y.sub.m.sbsb.2 =C.sub.2 Y.sub.0 +(y.sub.m -y.sub.0),
- wherein C.sub.1 and C.sub.2 are not both equal to one; and
- wherein a second document image is created having a change in whitespace between each of said rectangles in said X and Y directions and wherein each of said rectangles maintains said original size and said second document image is distorted from said input document image by said value C.sub.1 along an X direction and by said value C.sub.2 along a Y direction.
- 3. The method for varying whitespace of claim 2 further comprising the steps of:
- verifying that said second document image fits within a physical page having dimensions X, Y; and
- dividing said second document image into an initial and a subsequent image area if said second document image has dimensions that exceed said physical page;
- wherein each of said initial and said subsequent image areas fit within said physical page.
- 4. The method for varying whitespace of claim 3 further comprising the steps of:
- assigning a numerical sequence to each of said image areas;
- renumbering said numerical sequence to include said initial and said subsequent image areas to obtain a second numerical sequence; and
- mapping, in said second numerical sequence, said image areas to said physical page.
- 5. The method of claim 4, wherein the document image contains n image areas and wherein said n image areas comprise picture image areas and text image areas and wherein said step of mapping said image areas further comprises the steps of:
- (a) mapping said picture image areas to said physical page;
- (b) mapping, a first text image area to an upper left of said physical page;
- (c) mapping, an A.sub.i text image area beneath an A.sub.i-1 text image area when said A.sub.i text image area fits in said physical page;
- (d) mapping said A.sub.i text image area to a portion of said physical page adjacent a right hand edge of said A.sub.i-1 text image area when said A.sub.i text image area fits in said physical page and when step (c) exceeds a boundary of said physical page; and
- (e) beginning a new physical page when step (d) exceeds said boundary of said physical page.
- 6. The method of claim 2 wherein C.sub.1 is greater than C.sub.2.
- 7. The method of claim 2 wherein C.sub.2 is greater than C.sub.1.
- 8. In a character recognition system, a method for varying spacing in a document image and for preserving logical reading order, the method comprising the steps of:
- providing a representation of a document image to a run length extraction and classification means, said representation comprised of a plurality of scan lines;
- extracting run lengths from each individual scanline of said representation of said document image;
- classifying each of said run lengths as one of short, medium or long based on a length of said run length, wherein a plurality of run length records are created;
- constructing rectangles from said run length information, each of said rectangles representing a portion of said document image, each of said rectangles having a set of coordinates (x.sub.0,y.sub.0, x.sub.n,y.sub.n), and each of said rectangles separated by a space; and
- assigning each one of said rectangles to a second set of coordinates (x.sub.0.sbsb.2, Y.sub.0.sbsb.2) and (x.sub.n.sbsb., Y.sub.m.sbsb.2) wherein said second set of coordinates equals:
- x.sub.0.sbsb.2 =C.sub.1 x.sub.0 ;
- y.sub.0.sbsb.2 =C.sub.2 y.sub.0 ;
- x.sub.n.sbsb.2 =C.sub.1 x.sub.0 +(x.sub.n -x.sub.0) and;
- y.sub.m.sbsb.2 =C.sub.2 y.sub.0 +(y.sub.m -y.sub.0),
- wherein C.sub.1 and C.sub.2 are not both equal to one; and
- wherein a second document image having spacing different in size than said first document image is created, said second document image distorted from said original document image by said different spacing.
- 9. The method of claim 8 further comprising the steps of:
- classifying each of said rectangles as type image, vertical line, horizontal line or unknown;
- creating a plurality of text blocks from said rectangles classified as unknown;
- assigning a numerical order to each of said text blocks;
- verifying that said second set of coordinates fits within a physical page having dimensions X, Y;
- dividing a given image area having said second set of coordinates that exceed said physical page into an initial and a subsequent image area; and
- wherein each of said initial and said subsequent image areas fit within said physical page.
- 10. The method for varying whitespace of claim 9 further comprising the steps of:
- renumbering said numerical order to include said initial and said subsequent image areas to obtain a second numerical order; and
- mapping, in said second numerical order, said image areas to said physical page.
- 11. The method of claim 10, wherein the document image contains n image areas and wherein said n image areas comprise picture image areas and text image areas and wherein said step of mapping said image areas further comprises the steps of:
- (a) mapping said picture image areas to said physical page;
- (b) mapping, a first text image area to an upper left of said physical page;
- (c) mapping, an A.sub.i text image area beneath an A.sub.i-1 text image area when said A.sub.i text image area fits in said physical page;
- (d) mapping said A.sub.i text image area to a portion of said physical page adjacent a right hand edge of said A.sub.i-1 text image area when said A.sub.i text image area fits in said physical page and when step (c) exceeds a boundary of said physical page; and
- (e) beginning a new physical page when step (d) exceeds said boundary of said physical page.
- 12. An apparatus for varying spacing in a document image comprising:
- means for scanning said document image from a first page having a size, said document image having image data including text having a typesize, said text surrounded by space having an area;
- means for identifying a set of rectangles that bound said image data within the document image;
- means for mapping said set of rectangles to an area C.sub.1 X by C.sub.2 Y to form a second document image distorted along said X direction by a factor of C.sub.1 and along said Y direction by a factor of C.sub.2 ;
- wherein C.sub.1 and C.sub.2 are not both equal to one; and
- means for printing said second document image onto a second page having a size, wherein said first page size is equal to said second page size;
- wherein said second document image includes image data including text having said typesize and spacing having an expanded area.
- 13. An apparatus for varying spacing of an input document image including text, graphics, and noise images, said text having a type size, comprising:
- means for scanning said input document image;
- means for identifying a set of rectangles that bound image data within said input document image, each of said rectangles having an input size and separated by a spacing;
- means for assigning a first set of coordinates (x.sub.0, y.sub.0) and (x.sub.n, y.sub.m) to each one of said rectangles; and
- means for assigning each one of said rectangles to a second set of coordinates (x.sub.0.sbsb.2, y.sub.0.sbsb.2) and (x.sub.n.sbsb.2, y.sub.m.sbsb.2) wherein said second set of coordinates equals:
- x.sub.0.sbsb.2 =C.sub.1 x.sub.0 ;
- y.sub.0.sbsb.2 =C.sub.2 y.sub.0 ;
- x.sub.n.sbsb.2 =C.sub.1 x.sub.0 +(x.sub.n -x.sub.0) and;
- y.sub.m.sbsb.2 =C.sub.2 y.sub.0 +(y.sub.m -y.sub.0);
- wherein a second document image is created, said second document image distorted from said input document image by a factor C.sub.1 in said X direction and by a factor C.sub.2 in said Y direction wherein C.sub.1 and C.sub.2 are not both equal to one;
- wherein said second document image has a change in the spacing between said text, graphics and noise images in said X and Y directions.
- 14. A photocopier comprising:
- input means for obtaining a first digital representation of an input hard copy document;
- user input means for designating a magnification scale;
- a first portion of memory for storing said first digital representation;
- a processor, coupled to said input means, said first portion of memory, and said user input means, said processor for:
- (a) identifying text and graphics images in said first digital representation and bounding said images with a set of rectangles, each of said rectangles separated by spacing in an X and a Y direction, each of said rectangles having a size;
- (b) assigning a first set of coordinates to each of said rectangles;
- (c) assigning a second set of coordinates to each of said rectangles according to said magnification scale wherein said rectangles remain unchanged in size and said spacing between said rectangles changes in size in said X and Y directions;
- (d) producing a second digital representation of said document; and
- (e) storing said second digital representation of said document in a second portion of said memory;
- an output device, coupled to said processor and to said memory, for outputting said second digital representation of said document on paper wherein a second hard copy document, distorted from said input hard copy document by said changed spacing, is produced.
- 15. A document processing apparatus for altering the spacing of an input document, the apparatus comprising:
- a user input device having a selectable document whitespace magnification scale;
- a memory having a first portion and a second portion;
- an input device, coupled to said memory, wherein said document is transformed by said input device into a first digital representation and stored in said first portion of memory;
- a processor, coupled to said input device, said memory, and said user input device, said processor for:
- (a) identifying text and graphics images in said first digital representation and bounding said text and graphics images with a set of rectangles, each of said rectangles having a size and each separated from one another by spacing in X and Y directions;
- (b) assigning a first set of coordinates to each of said rectangles;
- (c) assigning a second set of coordinates to each of said rectangles according to said magnification scale wherein said rectangles remain unchanged in size and said spacing between said rectangles is changed in size by an amount equal to said magnification scale;
- (d) producing a second digital representation of said document; and
- (e) storing said second digital representation of said document in a second portion of said memory;
- an output device, connected to said processor and said memory, wherein said second digital representation is output on paper to produce an output document distorted from said input document by a factor of said magnification scale to have a change in whitespace along said X and Y directions.
- 16. The apparatus of claim 15 wherein said apparatus comprises an area network.
- 17. The apparatus of claim 15 wherein said apparatus comprises a personal computer.
Parent Case Info
This is a Continuation of application Ser. No. 08/028,676, filed Mar. 9, 1993, now abandoned which is a continuation-in-part of application Ser. No. 07/864,423 titled "Segmentation of Text, Picture and Lines of a Document Image" filed Apr. 6, 1992.
US Referenced Citations (10)
Non-Patent Literature Citations (1)
Entry |
Wahl et al. "Block Segmentation and Text Extraction in Mixed Text/Image Documents" Comp. Graphics and Image Proc., 20, pp. 375-390, 1982. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
28676 |
Mar 1993 |
|
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
864432 |
Apr 1992 |
|