The present invention relates generally to the field of computers and computer applications, and in particular to a system and method of specifying document layout definition.
Computers are increasingly used to handle and process documents, including documents that are composites of text, photographs, drawings, and graphic layout elements. Forms, templates, specialized scanning adapters, and specific re-purposing applications all require a very high degree of accuracy in the layout definition of these layout elements. such very accurate layout definition of a digital document is commonly termed ground truth. The ground truth definition of a document should specify the type, location, size, resolution, and/or special treatment of these layout elements. Existing systems and methods require the user to be very hands-on in every step of the ground truth process. Further, existing systems and methods do not provide an output that is applicable to other image processing applications such as print-on-demand, document re-purposing, document classification and clustering, etc.
In accordance with an embodiment of the present invention, a method of processing an image comprises receiving a definition of at least one region in the image, where the region definition has a location specification and a type specification. The method further comprises displaying the boundaries of the at least one defined region according to its type specification, receiving a definition of a visible area in the image, the visible area definition having a specification of margins around the image, generating an image layout definition comprising the region definition and the visible area definition, and saving the image layout definition.
In accordance with yet another embodiment of the invention, a method of processing an image comprises determining a definition of at least one region in the image, the region definition having a location specification and a type specification. The method further comprises generating an image layout definition comprising the region definition, searching for an image layout definition template that best matches the generated image layout definition, and conforming the generated image layout definition to the best-matched image layout definition template.
In accordance with yet another embodiment of the invention, a system for processing an image comprises a graphical user interface operable to display the image and receive a definition of at least one region in the image, the region definition having a location specification and a type specification, the graphical user interface further operable to display the boundaries of the at least one defined region according to its type specification. The system further comprises a processor generating an image layout definition comprising the region definition.
For a more complete understanding of the present invention, the objects and advantages thereof, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
The preferred embodiment of the present invention and its advantages are best understood by referring to
According to an embodiment of image ground truth system 10 operating on a computer platform 12, an image document 14 is provided as input thereto. Computer platform 12 may be any device with a display and a processor. Computer platform 12 may be a portable device or a desktop device and typically comprise a pointing device such as a mouse, touch pad, touch screen, or a writing stylus. Image document 14 is preferably a scanned image of a document in a media file type such as Tag(ged) Image File Format (.TIF), Bit Map (.BMP), Graphic Interchange Format (.GIF), Portable Document Format (.PDF), Joint Photographic Experts Group (.JPEG), etc. or an electronic document in a word processing format such as WORD (.DOC), Hypertext Markup Language (HTML), or another suitable document type. Image ground truth system 10 is operable to automatically analyze document 14 and detect zones in which the document layout elements are present. The document layout elements may include text, graphics, photographs, drawings, and other visible components in the document. Alternatively, system 10 permits the user to specify, using a graphic user interface 18, the various regions occupied by these layout elements. system 10 is operable to output a specification of the image document layout definition 16 in a specified format such as eXtensible Markup Language (XML). system 10 may also output the image document layout definition as a layout template to a template database 19. Template database 19 is a repository for templates that define the layout of image documents. A template comprises a definition of the region type, modality and other properties, visible area, and other specifications of the image document. Using predefined image document templates, new image documents can be quickly put together with new text, photograph, and graphic layout elements. Furthermore, predefined templates may be used to conform image documents to correct inadvertent shifts during document scanning, for example, so that they follow a predefined format. An example of this process is shown in
Image layout definition 16 can serve as input to a variety of systems and applications. For example, image layout definition 16 may be used for document comparison and clustering/classification purposes. Further, image layout definition 16 may be used as a template for processing information. For example, image layout definition 16 may define a template with six photographic regions arranged in a certain layout. This template may be used to arrange and layout photographs in a folder, for example. Image layout definition 16 may be easily compared with other templates or layout definition files to find the most suitable arrangement or layout of the photographs. The use of image layout definition 16 as a template also enables scanned document images that may have been slightly skewed or shifted to be corrected according to the layout specification in the template. In addition, image layout definition 16 may be used as input to a print-on-demand (POD) system that uses it to proof the layout of the documents as a measure for quality assurance. Image layout definition 16 may also be used to ensure proper rendering of a complex scanned document.
Region click-and-select process shown in block 32 enables a user to use a pointing device to indicate on the graphical user interface the location of points within regions of interest for classification and segmentation. For example, if the user clicks on a point on the image document displayed on the graphical user interface, the region containing the identified point is analyzed and the boundaries of the region are derived. The data type of the region containing the identified point is also determined. Therefore, the user may define the regions of the image document by successively clicking on a point within each region.
Automatic region analysis process shown in block 34 is a process that performs zoning analysis on the image document to form all of its regions using a segmentation process, and determine the region characteristics using a classification process. Various techniques are well-known in the art for performing segmentation analysis, which fall into three broad categories: top-down strategy (model-driven), bottom-up strategy (data-driven),and a hybrid of these two strategies. Examples of these strategies are described in Theo Pavlidis and Jiangying Zhou, Page segmentation and Classification, published in Document Image Analysis, pp 226-238, 1996, and Anil K. Jain and Bin Yu, Documentation Representation and Its Application to Page Decomposition, published in Pattern Analysis and Machine Intelligence, pp 294-308, Vol. 20, No. 3, March 1998. Various techniques are well-known in the art for performing classification analysis, which are also described in the above references. Further, a suitable automatic zoning analysis process is implemented in the PrecisionScan software used in the image capture devices such as the ScanJet 5300C manufactured by Hewlett-Packard Company of Palo Alto, Calif.
Process 20 further provides a third method of defining the regions in the image document, as shown in block 30. The process in block 30 enables the user to define a polygonal region, a rectangular region, and a visible area in the image document. This process is described in more detail below with reference to
In block 36, the defined regions in document 14 are displayed in graphical user interface 18 and an example of which is shown in
In block 40 of
At block 76, the boundaries of the generated region are verified to ensure that the enclosed region does not overlap another region in the document and that the boundary lines of the region do not cross each other, for example. A separate and independent region manager 77 may be selected to enforce the adherence to a region enforcement model. For example, one region enforcement model may specify that no regions may have overlapping boundaries, another region enforcement model may specify that a text region may be overlaid over a background region and that the text is contained completely within the background region, or another region enforcement model may specify a permissible ordering of overlapping regions and what type of layout elements those overlapping regions may contain (commonly termed “multiple z-ordering”), etc. If region irregularities have been detected, an pop-up window containing an error message is displayed. Process 70 may automatically delete the irregular region(s) or crop or shift regions so that the enforcement models are followed.
In block 78, the region type and modality and/or other definitions associated with the polygonal region are set to the default values. The default values may be determined a priori by the user or they may be system-wide defaults. A newly-created polygonal region may default to text and black-and-white type and modality values, respectively. These default values can be easily modified by the user to other values, such as described above and shown in
The default characteristics of the newly-created rectangular region may be set to the default values of text and black-and-white type and modality values, respectively, as shown in block 102. The newly-created rectangular region definition or the location of the rectangular region is generated and saved, along with other layout definitions of the document, as shown in block 104. The process ends in block 106.
An example of a model for the layout definition specification output of process 20 is:
The above represents a model for the ground truthing metadata produced for each image document. The data is represented as integers (int), floating point (double) and enumerated types or strings. The rectangular boundaries are represented as “xmin,” “xmax,” “ymin,” and “ymax,” and the vertices as “vertex(xcoord,ycoord).” The region type and region modality are specified by “region_type” and “region_modality.” The notation “#PCDATA” is replaced by actual data obtained by document analysis. An example of a partial layout definition specification of the exemplary image document shown in
It may be seen that one specified region is a region containing a photograph layout element that is to be treated as a black-and-white layout element. Its boundaries and vertices have been defined in X and Y coordinates. Further, the visible area boundaries is also defined in terms of left, right, top and bottom margins. The use of a format such as XML for the layout definition yields many advantages. Image documents may be compared with one another, classified and clustered using the layout definition specification. The layout definition specification may also be provided as input to a print-on-demand system that uses the specification to “proof” its layout and to maintain print quality.
There are instances in which a user may desire to specify the size of the image file to limit the amount of memory needed to load the image file or limit the bandwidth needed to transmit the image file over a network to a remote ground truth system. Process 140 enables the user to specify a smaller size (and thus lower resolution) of the image to use for ground truthing, as shown in
With all variables to the right of the equal-to sign known, and if the resolution is the same in the X and the Y axes, the X and Y resolution can be computed. The computed resolution may be used to open the image file as well as to transmit the image file across network links to limit the memory size or bandwidth needed to process the image file. The process ends in block 150.
In the 1024×768 display screen resolution example above, if the space available to display the image after accounting for the graphical user interface is 978×668 pixels, and the size of the image is 8.5 in.×11 in., then the maximum resolution in the Y-axis is 668/11, which equals to 60.7 pixels per inch (PPI). In the X-axis, the maximum resolution is 974/8.5, which equals to 114.6 pixels per inch. Therefore, 60 pixels per inch is the selected resolution to display the image so that it can be viewed on the screen in its entirety. Because the region boundaries are defined in pixels that are easily scaled up or down in resolution, and that a user can choose integer divisor values of the original resolution to scale the boundaries, region boundary information is maintained without blurring.
It may be seen that by using process 180, image documents can be standardized in format with very accurate region layout definitions that conform to the standard set forth in a template. The quality of the image documents so processed can be assured so that offset or skewed images can be detected and corrected. Furthermore, the treatment and processing of defined regions may be standardized according to the template.
Forms, templates, specialized scanning adapters, and specific re-purposing applications require a high degree of accuracy in layout definition. Embodiments of the present invention are operable to provide a highly accurate layout definition of an image document which specifies the location of layout elements in the image document, and their respective types and modalities. The present invention is operable to accept user input of region specification as well as using automatic segmentation and classification analysis. The user may input the boundaries of the regions easily by clicking on a region or by defining the boundaries of the region using the graphical user interface. The layout definition output in eXtensible Markup Language format can be easily manipulated, processed, or used by other applications. The layout definition output may also be used as a image document template that can be used to conform subsequent image documents as to the location of the regions and visible areas as well as the region type and modality. The use of a region management models also enables a user to conform the image document regions to the selected model. Further, the present invention enables a user to process a lower-resolution version of the image document to save on memory usage, processing resources and/or transmission bandwidth.
Number | Name | Date | Kind |
---|---|---|---|
4823395 | Chikauchi | Apr 1989 | A |
5123062 | Sangu | Jun 1992 | A |
5159667 | Borrey et al. | Oct 1992 | A |
5296939 | Suzuki | Mar 1994 | A |
5317680 | Ditter, Jr. | May 1994 | A |
5416849 | Huang | May 1995 | A |
5465304 | Cullen et al. | Nov 1995 | A |
5555362 | Yamashita et al. | Sep 1996 | A |
5592576 | Hayashi | Jan 1997 | A |
5596655 | Lopez | Jan 1997 | A |
5682540 | Klotz et al. | Oct 1997 | A |
5764866 | Maniwa | Jun 1998 | A |
5767978 | Revankar et al. | Jun 1998 | A |
5768333 | Abdel-Mottaleb | Jun 1998 | A |
5815595 | Gugler | Sep 1998 | A |
5818976 | Pasco et al. | Oct 1998 | A |
5822454 | Rangarajan | Oct 1998 | A |
5848184 | Taylor et al. | Dec 1998 | A |
5887082 | Mitsunaga et al. | Mar 1999 | A |
5893127 | Tyan et al. | Apr 1999 | A |
5901253 | Tretter | May 1999 | A |
5978519 | Bollman et al. | Nov 1999 | A |
5999664 | Mahoney et al. | Dec 1999 | A |
6047251 | Pon et al. | Apr 2000 | A |
6078697 | Ng | Jun 2000 | A |
6151426 | Lee et al. | Nov 2000 | A |
6163623 | Ohta | Dec 2000 | A |
6252677 | Hawes et al. | Jun 2001 | B1 |
6252985 | Mitsunaga et al. | Jun 2001 | B1 |
6263122 | Simske et al. | Jul 2001 | B1 |
6377703 | Yeung | Apr 2002 | B1 |
6385351 | Simske et al. | May 2002 | B1 |
6400845 | Volino | Jun 2002 | B1 |
6446099 | Peairs | Sep 2002 | B1 |
6459499 | Tomat | Oct 2002 | B1 |
6466954 | Kurosawa et al. | Oct 2002 | B1 |
6594030 | Ahlstrom et al. | Jul 2003 | B1 |
6674901 | Simske et al. | Jan 2004 | B1 |
6683984 | Simske et al. | Jan 2004 | B1 |
6735740 | Sakai et al. | May 2004 | B2 |
6751780 | Neff et al. | Jun 2004 | B1 |
6768816 | Hall et al. | Jul 2004 | B2 |
6778703 | Zlotnick | Aug 2004 | B1 |
6865576 | Gong et al. | Mar 2005 | B1 |
6957384 | Jeffery et al. | Oct 2005 | B2 |
7054509 | Rom | May 2006 | B2 |
20020152245 | McCaskey et al. | Oct 2002 | A1 |
20030103071 | Lusen et al. | Jun 2003 | A1 |
20040013302 | Ma et al. | Jan 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20050076295 A1 | Apr 2005 | US |