This invention generally relates to image thresholding and separation of foreground from background images and more particularly relates to a method for obtaining a high quality bitonal image from a document that has a significant amount of background color content.
In a production scanning environment, the digital output of a scanned paper document is often represented and stored in binary (black and white) form because of its greater efficiency in storage and transmission, particularly for textual images. Binary form is also well suited to text scanning and optical character recognition (OCR).
Typically, a scanner is used for scanning a document in order to obtain, from a charge coupled device (CCD) sensor, digital grey scale signals at 8 bits per pixel. Conversion of this 8-bit per pixel grey scale data to 1-bit per pixel binary data then requires some type of image thresholding process. Because image thresholding is an image data reduction process, it often results in unwanted image artifacts or some loss or degradation of image information loss. Errors in image thresholding can cause problems such as speckle noise in the document background or loss of low contrast characters.
There have been a number of attempts to improve image thresholding and obtain a binary image of improved quality. For example, commonly-assigned U.S. Pat. No. 4,868,670 (Morton et al.) discloses tracking a background value in an image, with a threshold value being a sum of a tracked background value, a noise value, and a feedback signal. Whenever an edge or other transition occurs in the image, the feedback signal is momentarily varied in a pre-defined pattern to momentarily modify the threshold value so that an output filtered thresholded pixel value has a reduced noise content. However, background tracking prevents significant difficulties, particularly where objects of interest are at relatively low contrast. A different approach is the adaptive thresholding described in U.S. Pat. No. 4,468,704 (Stoffel et al.) Here, thresholding is implemented by using an image offset potential, which is obtained on a pixel-by-pixel basis as a function of white peak and black valley potentials in the image. This offset potential is used in conjunction with nearest neighbor pixels to provide an updated threshold value that is adaptive, varying pixel-by-pixel. The peak and valley potentials are generated, for each image pixel, by comparing the image potential of that pixel with predetermined minimum white peak and maximum black valley potentials. Unfortunately, this technique also appears to exhibit difficulties in extracting low contrast objects in a thresholded image.
Commonly-assigned U.S. Pat. No. 5,583,659 (Lee et al.), incorporated herein in its entirety, discloses significant improvements to adaptive thresholding, such as is done on a pixel-by-pixel basis in the general scheme outlined in the '704 Stoffel et al. patent listed earlier. In the method described, localized intensity gradient data is first computed for each scanned greyscale pixel and can be used to determine whether or not the pixel is in the vicinity of an edge transition. Subsequent processing is then performed to further classify the pixel as part of an edge or flat field, object or background. The processed output image is enhanced in this way to provide improved thresholding. Significantly, two variable user inputs are used as thresholds to fine-tune the image data processing. When the best possible values for these variables are obtained, adaptive thresholding provides an image that can be accurately converted to bitonal data.
Extracting text and images of interest from a complex color background can be particularly difficult and the proposed conventional solutions achieve only limited success. For example:
While some of the methods described in these disclosures may be usable for limited types of simple multicolor documents, these methods are not well suited to documents having complex color content. Instead, some additional type of post-processing is typically called for, such as algorithms that connect neighboring pixels to identify likely text characters or OCR techniques for obtaining text character information from noisy greyscale data.
Although advances such as adaptive approaches have been made, and even though it has become practical to scan three-color RGB data from a document, the problem of obtaining accurate thresholding continues to pose a challenge. This difficulty can be particularly acute when it is necessary to scan and obtain text information from documents that have significant background color content.
Recent commercial banking legislation, known to those in banking as personal check 20, has caused heightened interest in the need for more accurate thresholding and conversion of images to binary data. With this legislation, electronically scanned image data from a check can be allowed the same legal status as the original signed paper check document. Scanned check data is used to form an image replacement document (IRD) that serves as a substitute check. Once this electronic image of the check is obtained, the original paper check can then be destroyed. The touted benefits of this development for the banking institution include cost reduction and faster transaction speeds. In the conversion from a paper check to a digital image, the check 21 legislation requires accurate transformation of the data into bitonal or binary form for reasons including reduced image storage requirements and improved legibility.
Even with advances in image scanning and analysis, complex background color content still presents a hurdle to taking advantage of the benefits of check 20 and of other capabilities made possible using an electronically scanned image. For example, while there is at least some standardization of dimensions and of the locations of various information fields on bank checks, there can be considerably different background content from one check to another. So-called “personalized” or custom checks from various check printers can include a variable range of color image content, so that even checks used within the same account can have different backgrounds. To complicate the problem further, there is no requirement that data recorded on the check be written in any particular pen color, which could simplify text extraction for some documents. Moreover, the information regions of interest can be varied from one check to the next. As a result, it can still be difficult to provide a fully automated binary scan of each check where the information of interest is reliably legible. A large percentage of images for scanned checks currently contain excessive background residual content and noise that not only reduce data legibility, but can also significantly increase image file size. File size inefficiencies, in turn, exact cost for added transmission time, storage space, and overall processing overhead, particularly considering the huge number of checks being scanned each day.
Clearly, there is a need for an improved scanning system and process that is capable of producing a clear, readable binary image of text or other image content without the need for a visual image quality inspection and subsequent adjustment of variables and reprocessing. Ideally, an improved system and process would be sufficiently compatible with currently available scanning components to allow the use of the system on scanner equipment that is presently in use, and to minimize the need for the design and manufacture of new components.
It is an object of the present invention to provide a method for obtaining bitonal image data from a document comprising:
From another aspect, the present invention provides a method for obtaining a bitonal image from a document comprising:
It is a feature of the present invention that it provides threshold values used to obtain a bitonal image based on scanned data from two or more color channels. The scanned color data is used to provide a high contrast object grey scale image that is processed using adaptive thresholding.
It is an advantage of the present invention that it provides a method for obtaining a bitonal image from a scanned document that can provide improved quality over images obtained using conventional methods.
It is a further advantage of the present invention that it provides a method for automating the selection of intensity and gradient thresholds for adaptive thresholding, eliminating the need for operator guesswork to provide these values.
These and other objects, features, and advantages of the present invention will become apparent to those skilled in the art upon a reading of the following detailed description when taken in conjunction with the drawings wherein there is shown and described an illustrative embodiment of the invention.
While the specification concludes with claims particularly pointing out and distinctly claiming the subject matter of the present invention, it is believed that the invention will be better understood from the following description when taken in conjunction with the accompanying drawings, wherein:
The present description is directed in particular to elements forming part of, or cooperating more directly with, apparatus in accordance with the invention. It is to be understood that elements not specifically shown or described may take various forms well known to those skilled in the art.
Using the method of the present invention, a color scan of a document is obtained and values obtained from the scanned image data are used to generate an enhanced bitonal image with reduced noise content. The color scan data is first used for identifying objects or regions of interest on the document and the most likely color of text or other image content within each region. Within each region of interest, color content of the foreground object of interest and of the background is then detected. Color scan data that shows the intensity or density for a color channel is then analyzed and used to generate a high contrast object grey scale (HCOGS) image. Edge detection logic then detects features having the largest gradient in the region of interest, so that accurate gradient thresholds and intensity thresholds can be generated for control of adaptive thresholding. The high contrast object grey scale image is converted to a bitonal image using adaptive thresholding, employing the generated gradient and intensity thresholds.
The method of the present invention works in conjunction with the multi-windowing adaptive thresholding methods disclosed in the '659 Lee et al. patent noted earlier in the background section. The '659 Lee et al. patent disclosure is incorporated herein in their entirety. In terms of data flow, the methods of the present invention are applied further “upstream” in image processing. The resulting enhanced image and processing variables data that are generated using the method of the present invention can be effectively used as input to the adaptive thresholding procedure noted in the '659 Lee et al. disclosure, thereby providing optimized input and tuned variables for successful execution of adaptive thresholding.
The method of the present invention has the goal of obtaining the best possible separation between foreground content of a document and its background content. The type of foreground content varies depending on the document. For example, with a personal check, foreground content includes text entered by the payor, which may require further processing such as OCR scanning for example. Other types of documents may include printed text foreground content or other image content. Background content may have one or more colors and may include significant amounts of graphic content. Unlike the background, the foreground content is generally of a single color.
Referring to
An important preparatory step for using the multicolor scan data efficiently is to identify one or more regions of interest on the document. A region of interest can be understood to be an area of the document that contains the foreground text or image content that is of interest and may contain some amount of background content that is not wanted. A region of interest could cover the entire scanned area; however, in most cases, such as with personal checks, there are merely one or more discrete regions of interest located on the document. Typically, regions of interest are rectangular.
An identify regions of interest step 120 is used to perform this function. There are a number of methods for selecting or detecting a region of interest. The method that is most useful in an individual case can depend on the type of document itself. For example, for scanned personal checks or other bank transaction documents, the size of the document and relative locations of its region(s) of interest such as for check amount, payee, and date, for example, are typically well-defined. In such a case, no sophisticated methods would be necessary for identifying a region of interest as part of step 120; it would simply be necessary to determine some base origin point in the scanned data and to measure a suitable relative distance from that origin to locate each region of interest. As one alternate method for identifying regions of interest 120, dimensional coordinate data value entered on a keyboard, or provided using some other user command mechanism such as using a mouse, keypad, or touchscreen, could be employed. Other methods for automatically finding the region of interest could include detecting the edges of horizontal lines using edge detection software. A 1-D Sobel edge detector could be used for this purpose, for example. Edge detection might also be used to help minimize skew effects from the scanned data. When scanning personal checks, for example, there are a small number of reference lines that can be detected in this manner. By performing edge detection over a small range of angles about the vertical, image processing algorithms can determine and compensate for a slight amount of skew in the scanned data.
Among the various techniques that have been proposed for identifying the region of interest containing text against a complex background are those described in the research paper entitled “Locating Text in Complex Color Images” by Yu Zhong, Kalle Karu, and Anil K. Jain in Pattern Recognition, Vol. 28, No. 10, 1995, pp. 1523-1535. Approaches described by these authors include connected component analysis, used for detection of horizontal text characters, where these characters have a color that is sufficiently distinct from the background content. Other approaches include spatial variance analysis, detecting the sharp transitions that indicate a row of horizontal text characters. Authors Zhong, Karu, and Jain also propose a hybrid algorithm that incorporates strengths of both connected component and spatial variance methods. As noted by these authors, however, the methods they employ require empirically tuned parameters and achieve only limited success where the text and background color content are too similar or where text characters are connected to each other, such as in handwritten or cursive text.
In many cases, documents of a certain class have one or more reference markings that help to locate foreground text or other content of interest. In one embodiment, as shown in
Within each identified region of interest, color content of the foreground text or other foreground image content and color content of the background can then be detected as part of identify regions of interest step 120. This can be determined in a number of ways. In one embodiment, the three RGB channels are each checked to determine which channel has the largest contrast difference for the object(s) of interest within the region of interest. Image data from this channel is then used to locate the desired text or foreground image content, based on the observation that the desired image content is darker than the surrounding background. Histogram analysis can be used as a part of this process or as validation to isolate the desired foreground text or image content as being no more than about 20% of the highest density image within the limited region of interest.
Once the set of pixels containing foreground image content have been identified, the data value in each color channel (typically RGB) for each of these pixels is used to determine color of the foreground image or text. This foreground content color is typically computed as the averaged red, green, and blue values of pixels in this set. The background color is then computed as the averaged RGB values of pixels outside the foreground image pixel set. Alternately, a grey scale image could be generated from the scanned color image data and processed to identify one or more regions of interest.
Using the processing steps just described, identify regions of interest step 120 has identified one or more regions of interest on the document and, within each region, the color composition of the foreground text or other image and of the predominant portion of the background in the region of interest. These important image attributes are used for generating the HCOGS image and GT and IT thresholds for each region in the processing steps that follow. It is important to emphasize that each region of interest on a document can be handled individually, allowing the generation of local GT and IT threshold values for each region of interest. This capability may or may not be important in any specific application, but does allow the flexibility to provide bitonal images for documents where background content is highly complex or even where foreground text or image content in different regions of the same document may be in different colors.
Referring again to
As yet another alternative, the HCOGS image can be generated from all three of the color channels. For example, for a substantially neutral foreground object, an averaging of the Red, Green, and Blue values may be used, so that each grey scale value is formed as a pixel using:
Still other alternatives for arriving at a grey scale value include more complex combinations using weighted values, such that each color plane value has a scalar multiplier or where division is by other than an integer, as in the following example:
The exemplary sequence that follows illustrates how the high contrast object grey scale image can be obtained for personal check 20 of
As is shown for the expanded high contrast object grey scale image generation step 140 in
T2rg=|R2t−G2t|
T2rb=|R2t−B2t|
T2gb=|G2t−B2t|
For the background in region R2, the small letter b in subscripts indicates the measured background value in the data and Q represents the difference in computed background color value, computed using the different color channels, as follows:
Q2rg=|R2b−G2b|
Q2rb=|R2b−B2b|
Q2gb=|G2b−B2b|
Still referring to
By way of example,
In this way, at the conclusion of high contrast object grey scale image generation step 140 (
The next sequence of steps, shown in
Thus, for example, each time a pixel having a grey scale value (L) of 112 is encountered, the gradient value obtained at that pixel is added to all previous gradient values for grey scale value 112. In this way, an accumulated sum GS(L) is obtained for each grey scale value L. For example, if the histogram shows that there are 67 pixels having a grey scale value of 112, the accumulated sum GS(112) is the accumulated total of all of the 67 gradient values obtained for these pixels.
In order to use these summed values, an averaged gradient AG(L) is computed as part of an averaged gradient computation step 162. To obtain an averaged gradient for each grey scale value L, the following straightforward division is used:
AG(L)=GS(L)/N(L)
Thus, continuing with the example given earlier, for the 67 pixels having a grey scale value of 112, the corresponding averaged gradient AG(112) is computed as:
AG(112)=GS(112)/67
This computation is executed for each grey scale value L. The result can be represented as is shown in
Still referring to
Text Area Percentage at L<94=30%
Text Area Percentage at L<32=6%
Given these computed Text Area Percentages, the candidate IT value of 94 is too high. The candidate IT value of 32, on the other hand, yields an area percentage of about 6%, which is in the desired range. A resultant IT value of 32, along with its corresponding resultant GT value, is then used for further processing. Referring to the example region R2 shown in
The sequence of steps 150, 160, and 170 is performed for each region of interest in one embodiment. As a result of the processing sequence shown in
An adaptive thresholding step 180 executes a thresholding process in order to generate a bitonal or binary image output for the document that was originally scanned in multiple color channels. This thresholding step 180 is adaptive in the sense that the IT and GT threshold values that are provided to it can control its response to image data within a specific region of interest. These threshold values can differ not only between separate documents, but also between separate regions of interest within the same document. In one embodiment, adaptive thresholding step 180 executes the processing sequence disclosed in the '659 Lee et al. patent cited earlier.
Using the processing summarized in
The invention has been described in detail with particular reference to certain preferred embodiments thereof, but it will be understood that variations and modifications can be effected within the scope of the invention as described above, and as noted in the appended claims, by a person of ordinary skill in the art without departing from the scope of the invention. For example, a number of different techniques could be used as alternatives to the 3×3 Sobel operator for obtaining gradient values G(L) at each pixel location. A scalar gradient sensitivity factor could be used to adjust the gradient values G(L) obtained, such as multiplying by a default value (0.8 in one embodiment). Different scalar values could be used depending on the color plane data or in order to compensate for differences in scanner sensitivity.
Scanning itself could be performed on a variety of documents and at a range of resolutions. Scan data could obtain two or more color channels, such as obtaining conventional RGB data but using only two of the color channels. A scanner obtaining more than three color channels could be used and the method extended to obtain bitonal data using color information from four or more channels.
Thus, what is provided is a method for obtaining a high quality bitonal image from a document that has a significant amount of background color content, using color scanned data.