Embodiments of the present invention are related to the field of document processing.
Optical Character Recognition (OCR) technology is widely used today to create a document image which recognizes characters in order to capture the content of the original document. However, OCR is typically still not entirely successful in capturing the content of a document. While OCR is usable in capturing the content of a document, other characteristics of the document may not be recognized, or re-created, using OCR.
As an example, OCR is not well suited for capturing the aesthetic qualities of a document comprising text and images. More specifically, the spatial relationships between various zones comprising a document, the colors used in the document, and other style elements which a user may wish to re-create in another document without necessarily re-using the content of the original document.
Embodiments of the present invention recite a system and method for creating an editable template from a document image. In one embodiment of the present invention, the spatial characteristics and the color characteristics of at least one region of a document are identified. A set of characteristics of a graphic representation within the region are then determined without the necessity of recognizing a character comprising the graphic representation. An editable template is then created comprising a second region having the same spatial characteristics and the same color characteristics of the at least one region of the document and comprising a second graphic representation which is defined by the set of characteristics of the first graphic representation.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present invention and, together with the description, serve to explain the principles of the invention. Unless specifically noted, the drawings referred to in this description should be understood as not being drawn to scale.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the present invention will be described in conjunction with the following embodiments, it will be understood that they are not intended to limit the present invention to these embodiments alone. On the contrary, the present invention is intended to cover alternatives, modifications, and equivalents which may be included within the spirit and scope of the present invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, embodiments of the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present invention.
Some portions of the detailed descriptions which follow are presented in terms of procedures, logic blocks, processing and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signal capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “identifying,” “determining,” “creating,” “differentiating,” “assigning,” “comparing” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
In embodiments of the present invention, the document image 101 is a scanned image of a document in a media file type such as Tag(ged) Image File Format (.TIF), Bit Map (.BMP), Graphic Interchange Format (.GIF), Portable Document Format (.PDF), Joint Photographic Experts Group (.JPEG), etc. or an electronic document in a word processing format such as WORD (.DOC), Hypertext Markup Language (HTML), or another suitable document type. Template creation system 101 is operable to automatically analyze document image 101 and detect regions in which the document layout elements are present. The document layout elements may include text, graphics, photographs, drawings, and other visible components of document image 101. Alternatively, template creation system 120 permits the user to manually specify, using a graphic user interface, the various regions occupied by these layout elements. Template creation system 120 is operable to output a specification of the image document layout definition in a specified format such as extensible Markup Language (XML). In embodiments of the present invention, template creation system 120 outputs editable template 130 to, for example, a storage device such as a template database. In embodiments of the present invention, editable template 130 comprises a definition of the region type, modality and other properties, visible area, and other specifications of the document image (e.g., 101). Using predefined image document templates, new image documents can be quickly put together with new text, photograph, and graphic layout elements while still retaining the overall look and aesthetic qualities of document image (e.g., 101). Furthermore, predefined templates such as editable template 130 may be used to conform image documents to correct inadvertent shifts during document scanning, for example, so that they follow a predefined format.
In embodiments of the present invention, an image layout definition can also serve as input to other systems and applications. For example, an image layout definition may be used for document comparison and clustering/classification purposes. Further, an image layout definition may be used as a template for processing information. For example, and image layout definition may define a template with six photographic regions arranged in a certain layout. This template may be used to arrange and layout photographs in a folder, for example. An image layout definition may be easily compared with other templates or layout definition files to find the most suitable arrangement or layout of the photographs. The use of an image layout definition as a template (e.g., 130) also enables scanned document images that may have been slightly skewed or shifted to be corrected according to the layout specification in the template. In addition, an image layout definition may be used as input to a print-on-demand (POD) system that uses it to proof the layout of the documents as a measure for quality assurance. An image layout definition may also be used to ensure proper rendering of a complex scanned document.
With reference to
In one embodiment, a region click-and-select process enables a user to use a pointing device to indicate the location of points within regions of interest for classification and segmentation. For example, if the user clicks on a point 302 of document image 101 displayed on the graphical user interface of computer system 110, the region (e.g., 301) containing the identified point 302 is analyzed and the boundaries of the region (e.g., 301) are derived. The data type (e.g., an image) of the region containing the identified point is also determined. Therefore, the user may define the regions of document image 101 by successively clicking on a point within each region.
Automatic region analysis is a process that performs zoning analysis on document image 101 to form all of its regions using a segmentation process, and determine the region characteristics using a classification process. Various techniques are well-known in the art for performing segmentation analysis, which fall into three broad categories: top-down strategy (model-driven), bottom-up strategy (data-driven), and a hybrid of these two strategies. Various techniques are also well-known in the art for performing classification analysis. Alternatively, a user can manually define a polygonal region, a rectangular region, and a visible area in document image 101. This process is described in greater detail below.
In embodiments of the present invention, the defined regions in document image 101 are then displayed on computer system 110. In one embodiment, the boundaries of each region are outlined by color-coded lines. For example, a text region may be outlined in green, a color graphic region may be outlined in purple, a black and white graphic region may be outlined in blue, a photographic region may be outlined in yellow, etc. Furthermore, a user may provide or modify the layout definition of selected regions in the document in one embodiment. For example, the user may select a region containing a photographic element and change the current region type setting “photo” to another region type. The user may also verify or modify the layout specification by inputting the region modality (such as black and white, gray scale or color), highlighting a specific region, and deleting a region using the same pop-up submenu. By specifying the modality of a region, the bit-depth of the region is effectively changed. For example, a black-and-white setting may equate to an I-bit bit-depth, a gray scale setting may equate to an 8-bit bit-depth, and a color setting may equate to a 24-bit bit-depth. Therefore, by giving the user the ability to change the modality and type of each region, the same image document can be modified to be used for another purpose, which is commonly known as re-purposing.
In one embodiment, if desired, the user may also update the boundaries of the defined regions by selecting the region and then drag the outline of the region boundaries to enlarge or contract the region by a process such as “rubberband boxing.” The user may also modify or specify the margins of document image 101 by selecting menu items associated with the visible area function. In one embodiment, the visible area of document image 101 defaults to the entire image, but the user may make the visible area smaller than the entire document image. In one embodiment, if the visible area specified by the user is too small to fully enclose any one region in document image 101, it is automatically expanded to include the full boundaries of all the regions in document image 101. A click-and-drag method can also be used to modify the visible area of the document image 101.
As described above, one embodiment permits user definition of polygonal regions, rectangular regions, and visible areas in document image 101. Generally, polygonal regions are regions with non-rectangular boundaries or regions with more complex boundaries. To create a polygonal region, the user may select a create polygon function, and then the user may indicate the vertices of the polygon around the document layout element by successive clicks of the pointing device or mouse on document image 101. The displayed document image 101 shown on computer system 110 is updated continually on the screen to provide a visual feedback of the resulting lines and vertices of the polygonal region. In one embodiment, template creation system 120 may automatically close the polygonal region, in other words connect the first user-indicated vertex and the last user-indicated vertex. The user may indicate the completion of the vertices by selecting an appropriate function or by double-clicking when inputting the last vertex. The polygonal region is thus entered by the user.
In one embodiment, the boundaries of the generated region are verified to ensure that the enclosed region does not overlap another region in document image 101 and that the boundary lines of the region do not cross each other, for example. A separate and independent region manager may be selected to enforce the adherence to a region enforcement model in one embodiment. For example, one region enforcement model may specify that no regions may have overlapping boundaries, another region enforcement model may specify that a text region may be overlaid over a background region and that the text is contained completely within the background region, or another region enforcement model may specify a permissible ordering of overlapping regions and what type of layout elements those overlapping regions may contain (commonly termed “multiple z-ordering”), etc.
In one embodiment, the region type and modality and/or other definitions associated with the polygonal region are set to the default values. The default values may be determined a priori by the user or they may be system-wide defaults. A newly-created polygonal region may default to text and black-and-white type and modality values, respectively. These default values can be easily modified by the user to other values, such as described. A specification of the polygon region definition is generated in one embodiment. However, the generation of the polygonal region definition in a particular format, such as extensible Markup Language, may be performed when the entire document layout has been completed. The polygonal region can be saved along with the other document layout definitions of the document.
Additionally, a user can define a rectangular region in one embodiment. A rectangular region is, by definition, a four-sided area with 90 degree corners. The user may first select a create a rectangular region function, and then indicate, using the pointing device on the graphical user interface, the first corner of the rectangle. A rubberband box is displayed on the graphical user interface which enables the user to drag or move the opposing corner of the rectangular region. In one embodiment, the boundaries of the generated rectangular region are verified by using a region manager to ensure that the resultant regions comply with a region enforcement model. For example in one embodiment, the region may not be permitted to overlap another region in the document and that the boundary lines of the region should not cross each other. Other examples of region enforcement models in accordance with an embodiment of the present invention comprise a specification that no regions may have overlapping boundaries, a specification that a text region may be overlaid over a background region and that the text is contained completely within the background region, or a specification of permissible ordering of overlapping regions and what type of layout elements those overlapping regions may contain (commonly termed “multiple z-ordering”), etc. In one embodiment, the default characteristics of the newly-created rectangular region may be set to the default values of text and black-and-white type and modality values respectively.
As described above, the visible area definition specifies the outer boundaries around the edge of the document. In one embodiment, the user invokes the visible area functionality by selecting the create visible area function and indicates the first corner of the visible area. A rubberband box is then displayed in the graphical user interface to enable the user to manipulate the size (width and length) of the visible area. In one embodiment, the user then indicates the location of the opposite corner of the visible area using the pointing device. The resulting visible area boundaries are displayed and verified. In one embodiment, if the visible area boundaries are too small to fully enclose any one region in the document, its boundaries are automatically expanded to enclose the boundaries of all the regions in the document. The visible area definitions are generated and saved along with other document layout element layout definitions, for later use in creating editable template 130. The visible area layout specification is particularly important in electronic publication applications as it enables the user to specify the margins on the image, and thus the amount of white space around the boundaries of the page.
As shown in
In block 220 of
In one embodiment, color space conversion of document image 101 is performed. Assuming an input document image 101 is in the RGB (Red, Blue, Green) color space, the RGB input data is converted to a luminance/chrominance space, such as a known YCrCb space. In one embodiment, the conversion can be done using a pre-calculated look-up-table to speed up the computation as is implemented in some image/video compression programs. In one embodiment, when the image data is input in luminance, color space conversion can be omitted.
In one embodiment, smoothing (e.g., low-pass filtering) is then performed which is useful in eliminating some noise effects. In one embodiment, performing smoothing is determined by the resolution at which document image 101 was acquired and the minimum size of the characters which can be processed. Therefore, it is appreciated that smoothing is not performed in some situations. In one embodiment, a Gaussian lowpass filter construct may be applied to provide a requisite level of smoothing.
In one embodiment, edges within the image are identified and classified as either NON EDGE, WHITE EDGE, or BLACK EDGE. In one embodiment, this comprises calculating a vertical gradient, a horizontal gradient, and the magnitude of gradient. A discreet Laplacian (a second directive is then calculated and each pixel is then classified as either NON EDGE, WHITE EDGE, or BLACK EDGE.
In one embodiment, horizontal line segments are classified by edge-bounded averaging. For example, for a horizontal line, an analysis proceeds from left to right to identify consecutive segments of NON EDGE pixels and EDGE (including both WHITE and BLACK) pixels. Each NON EDGE segment is potentially the interior of a text, or graphic character? In one embodiment, a NON EDGE segment, except at the left and right image border, is sandwiched by two edge segments.
In one embodiment, vertical consistency is also accounted for. For example, for a segment tentatively classified as BLACK INTERIOR (or WHITE INTERIOR), the number of pixels classified as WHITE INTERIOR (or BLACK INTERIOR) in the previous line is counted. IF the number is larger than a preset percentage, of the segment length, the segment may be disqualified as text, or a graphic character, and it is classified as NON TEXT.
In one embodiment, vertical segments classified as NON TEXT are examined to determine whether some of them can be reclassified using vertical filling criteria. In one embodiment, the length of a segment should be less than a given number which may depend upon the resolution of document image 101. Additionally, the immediate neighbor pixels of the two ends should be compatible types. For example, BLACK INTERIOR and BLACK EDGE, or WHITE INTERIOR and WHITE EDGE may be identified as compatible types of neighbor pixels. Within those qualified segments, segments whose length is 1 and both of two end neighbors are edges of the same type of either BLACK EDGE or WHITE EDGE are distinguished. For this type of segment, the segment is preferable reclassified the same type as its end neighbors. For other qualified segments, the segment can be reclassified as BLACK INTERIOR if its end neighbors are either BLACK INTERIOR or BLACK EDGE, and WHITE INTERIOR if its end neighbors are either WHITE INTERIOR or WHITE EDGE.
In one embodiment, vertical consistency analysis is performed upon pixels not yet classified as NON TEXT. In one embodiment, vertical consistency analysis identifies horizontal segments characterized by consecutive pixels not classified as edges (WHITE EDGE, BLACK EDGE, and a newly introduced DELETED EDGE) and having a length exceeding a length threshold. In one embodiment, each pixel within such a segment should be WHITE INTERIOR, BLACK INTERIOR, or NON TEXT. DELETED EDGE refers to a pixel that is an edge pixel, but does not qualify as a text pixel.
In one embodiment, pixel connectivity analysis is also performed to identify aggregates of pixels that have been identified as candidates for text and collects their statistics at the same time. In one embodiment, the aggregate is called a sub-blob. Two pixels belong to the same sub-blob if they are 8-neighbor connected, and they are labeled as the same category BLACK (EDGE or INTERIOR) or WHITE (EDGE or INTERIOR).
In one embodiment, sub-blobs are examined. For example, in one embodiment if the total number of pixels is less than a given threshold, the sub-blob is marked s NON TEXT.
In one embodiment, 8-neighbor sub-blobs not marked as NON TEXT are grouped into blobs. The connectivity of sub-blobs is the same as for pixels in one embodiment. In other words, two sub-blobs, whether they are white sub-blobs or black sub-blobs, are connected if they share at least one 8-jconnected pixel pair. Typically, there is not constraint on the number and topological arrangement of sub-blobs within one blob. The following statistics for each blob are collected in one embodiment: the number of outer border pixels and the number of inner sub-blobs. An outer border pixel is a pixel belonging to the blob and is neighbored to a NON TEXT pixel. In inner sub-blob is a sub-blob belonging to the blob and does not connect to any pixel that does not belong to the blob.
In one embodiment, text pixels are next identified. A complex document image may include dark characters on light background, light characters on dark background and/or characters on top of pictorial regions. Correspondingly, a blob may contain both black and white sub-blobs. In order to identify text pixels, a determination of which type (black or white) of sub-blob is text. One embodiment classifies all pixels within an image as text and non-text using a binary notation (e.g., where a bit 1 represents a text pixel and a bit 0 represents a non-text pixel). Alternatively, a bit 0 may be used to represent a text pixel and a bit 1 to represent a non-text pixel.
Thus, one embodiment provides text region extraction. Compound document images are images containing mixtures of text characters, line drawings, and continuous toned pictorial regions. Block 220 allows extraction of sharp edge components such as letters, numbers, line drawings, logos, symbols, etc. from document image 101. Additionally, block 220 facilitates detecting these components when they are overlying images or colored backgrounds. In contrast to Optical Character Recognition (OCR) systems, block 220 can detect and separate these components from document image 101 without the necessity of recognizing a letter, number, symbol, line drawing, logo, or other graphic representation.
Referring now to
In block 230 of
Referring now to
In block 240 of
In one embodiment, a derivative of the horizontal projection is then obtained. Assuming that the line of text is horizontal, a measure of the pixel intensities along each line of text is made which is plotted as a projection profile. In one embodiment, the maximal derivative values of the positive and negative slopes of the projection profile are plotted which are descriptive of the X-height lines and baseline respectively of a given line of text. With reference to
In one embodiment, the projection lines are sorted and projection lines having a maximal derivative value greater than a given threshold are obtained. In one embodiment, to detect the text line and X-height line, the local maximum peaks are selected and derivatives are sorted in descending order. The project lines that have a derivative larger than a pre-determined threshold (e.g., the average intensity of the image) are selected and analyzed.
In one embodiment, the projection lines are then filtered based on the distance between adjacent projection lines. For example, in one embodiment the selected projection lines are first filtered based on their distances with the adjacent lines and projection intensities (e.g., the distance between two adjacent lines such as between lines 401 and 402, between lines 403 and 404, between lines 405 and 406, or between lines 407 and 408) must be larger than 3 points (on 72 dpi resolution). If the distance between two adjacent lines is not larger than 3 points, the projection line having the higher intensity value is selected.
In one embodiment, the projection lines are filtered so that for each projection line, the average signal intensity between two adjacent lines are higher than average on one side and lower than average on the other side. This is performed in order to detect both base and X-height for a text line. The project lines should alternate as base lines and X-height lines. The average signal intensity between each pair of lines is measured. Typically, the average image intensity between a base line and an X-height (e.g., between X-height line 401 and baseline 402 of
Font size+2*R/72.27 dpi*dT;
Line spacing+2*R/72.27 dpi*dL
As shown in
In block 250 of
In block 260 of
In one embodiment, place holder for each text region has the estimated font size and line spacing as determined in block 240 above. Text content within that region is filled with “Text . . . ” in one embodiment as a place holder of the characteristics of that particular region. Referring to
The overall look and feel of document image 101 is retained in editable template 130 in accordance with an embodiment of the present invention. However, the content of document image 101 is not retained in editable template 130. Thus, a user can use editable template 130 to create a new document which has the same aesthetic qualities as document image 101, but containing different content.
In one embodiment, system 500 further comprises a graphic characteristics determiner 502 for determining a set of characteristics of a first graphic representation within at least one identified region. Furthermore, this can be performed without the necessity of recognizing a character comprising the graphic representation. As described above, embodiments of the present invention are operable for determining graphic characteristics such as font size and line spacing of a text region of a document.
In one embodiment, system 500 further comprises an editable template creator 503. As described above, embodiments of the present invention are operable for creating an editable template (e.g., 130) comprising a second region (e.g., 310a) having the same spatial characteristics of said at least one region (e.g., 310) and comprising a second graphic representation which is defined by said set of characteristics of said first graphic representation. In other words, the text, or other graphic representation, will have the same font size and line spacing as the text in the identified region of document image 101.
In one embodiment, system 500 further comprises an automatic region differentiator 504. As described above with reference to block 210 of
In one embodiment, system 500 further comprises a graphic representation identifier 505 for identifying at least one graphic representation based upon the contrast of the graphic representation and for determining that a region comprising a graphic representation overlies an image region. With reference to
In one embodiment, system 500 further comprises a color determiner 506. In embodiments of the present invention, color determiner 506 is for determining a representative color of a region (e.g., 310 of document image 101). It is noted that color determiner 506 determines a representative color for each region identified from document image 101 in embodiments of the present invention.
In one embodiment, system 500 further comprises a color assigner 507. In embodiments of the present invention, color assigner 507 assigns the representative color determined by color determiner 506 to a corresponding region of editable template 130. For example, the representative color of region 310 is assigned to region 310a of editable template 130 by color assigner 507. It is noted that each representative color of an identified region of document image 101 is assigned to a corresponding region of editable template 130 in embodiments of the present invention. Thus, a representative color of an image region (e.g., 301 of
In one embodiment, system 500 further comprises a signal intensity determiner 508. In embodiments of the present invention, signal intensity determiner 508 is used as described above with reference to block 240 in determining a font size and line spacing of font within an identified text region. In one embodiment, signal intensity determiner 508 determines the signal intensity of a text region (e.g., an average signal intensity of text region 310a) as a whole. Signal intensity determiner 508 then determines the signal intensity of each line of text, or other graphic representation (e.g., between lines 401 and 402 of
In one embodiment, system 500 further comprises a comparator 509 for comparing the average signal intensity of text region 310a with a signal intensity of a line of text and for comparing the average signal intensity of text region 310a with the signal intensity of an area between two lines of text.
In embodiments of the present invention, system 500 further comprises a font size deriver 510 for deriving the font size of a line of text as described above with reference to block 240 of
In embodiments of the present invention, system 500 further comprises a line spacing deriver 511 for deriving a line spacing of text, or other graphic representations, comprising a text region as described above with reference to block 240 of
In block 620 of
In block 630 of
In block 720 of
In block 730 of
In block 740 of
In block 70 of
In block 760 of
With reference to
In the present embodiment, computer system 800 includes an address/data bus 801 for conveying digital information between the various components, a central processor unit (CPU) 802 for processing the digital information and instructions, a volatile main memory 803 comprised of volatile random access memory (RAM) for storing the digital information and instructions, and a non-volatile read only memory (ROM) 804 for storing information and instructions of a more permanent nature. In addition, computer system 800 may also include a data storage device 805 (e.g., a magnetic, optical, floppy, or tape drive or the like) for storing vast amounts of data. It should be noted that the software program for creating an editable template from a document image of the present invention can be stored either in volatile memory 803, data storage device 805, or in an external storage device (not shown).
Devices which are optionally coupled to computer system 800 include a display device 806 for displaying information to a computer user, an alpha-numeric input device 807 (e.g., a keyboard), and a cursor control device 808 (e.g., mouse, trackball, light pen, etc.) for inputting data, selections, updates, etc. Computer system 800 can also include a mechanism for emitting an audible signal (not shown).
Returning still to
Furthermore, computer system 800 can include an input/output (I/O) signal unit (e.g., interface) 809 for interfacing with a peripheral device 810 (e.g., a computer network, modem, mass storage device, etc.). Accordingly, computer system 800 may be coupled in a network, such as a client/server environment, whereby a number of clients (e.g., personal computers, workstations, portable computers, minicomputers, terminals, etc.) are used to run processes for performing desired tasks. In particular, computer system 800 can be coupled in a system for creating an editable template from a document.
The preferred embodiment of the present invention, a system and method for creating an editable template from a document, is thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the following claims.