The present invention relates generally to scanning documents, and more particularly to a scanner calibration strip, to a scanner, and to a method for segmenting a scanned document image into text symbols and halftoned non-text images.
Scanners are used to scan an image to create a scanned image which can be displayed on a computer monitor, which can be used by a computer program, which can be printed, which can be faxed, etc. Conventional scanners include a housing having a calibration strip which includes a rectangular calibration strip medium which has a scannable, completely white surface having a predetermined level of “whiteness”. The scanner also includes a controller and a scan bar having a linear array of sensor (i.e., optical sensor) elements. Each sensor element produces a signal proportional to the amount of light reaching the element. The signal gives a light value of the pixel of the image read by that sensor element. System components are not perfect, and their performance may degrade over time.
At times during scanner operation, a scanned calibration image of the white calibration strip medium is obtained using the scan bar, and tone-correction filtering is performed on the light value read by each sensor element. The tone-correction filtering has filtering parameters whose values are determined so that the corrected (filtered) light value of the sensor element will ideally match the predetermined “whiteness” level of the white calibration strip medium. For subsequent scans, the light value read by each sensor element is adjusted using the determined filtering parameters for that sensor element.
A scanned document image may include text symbols and halftoned non-text images. Conventional image segmentation divides the scanned document image into areas of text symbols and areas of halftoned non-text images.
Segmentation algorithms having segmentation parameters are used to classify a read pixel as a text pixel or as a halftoned non-text pixel based on nearby pixels. Areas having a predetermined degree of “whiteness” are classified as background areas. Then, in one example, a first special filter is applied to the sensor element light values of an area having a text symbol to “sharpen” the text symbol, and a second special filter is applied to the sensor element light values of an area having a halftoned non-text image to suppress objectionable moiré patterns.
What is needed is an improved scanner calibration strip, an improved scanner, and an improved method for segmenting a scanned document image into text symbols and halftoned non-text images.
A expression of an embodiment of the present invention is for a scanner calibration strip medium. The scanner calibration strip medium includes at least one of a plurality of spaced-apart text symbols and a plurality of spaced-apart, halftoned non-text images.
A second expression of an embodiment of the present invention is for a scanner including a scan bar, a scanner housing, and a scanner calibration strip medium disposed on the scanner housing and scannable by the scan bar. The scanner calibration strip medium includes a plurality of spaced-apart text symbols having text pixels and includes a plurality of spaced-apart, halftoned non-text images having halftoned non-text pixels.
A method of the present invention is for segmenting a scanned document image of a scanner into text symbols and halftoned non-text images. The scanner includes a scanner calibration strip medium. The scanner calibration strip medium includes a plurality of spaced-apart text symbols having text pixels and includes a plurality of spaced-apart, halftoned non-text text symbols having halftoned non-text pixels. The scanner calibration strip medium has a known pixel classification including which pixels of the scanner calibration strip medium are text pixels and which pixels of the scanner calibration strip medium are halftoned non-text pixels.
The method includes obtaining a scanned calibration image of the scanner calibration strip medium, wherein the scanned calibration image includes pixels having light values. The method also includes performing a tone-correction filtering of the light values of the scanned calibration image using filtering parameters each having a value and performing a segmentation of the scanned calibration image into text symbols and halftoned non-text images without using the pixel classification of the scanner calibration strip medium, wherein the segmentation uses segmentation parameters each having a value. The method further includes adjusting the values of the filtering parameters and the values of the segmentation parameters, as required, for a subsequent tone-correction filtering and segmentation of the scanned calibration image to better match the pixel classification of the scanner calibration strip medium. The method also includes tone-correction filtering the scanned document image using the adjusted values of the filtering parameters and segmenting the scanned document image using the adjusted values of the segmentation parameters.
Several benefits and advantages are derived from the scanner calibration strip, the scanner, and/or the method of the present invention. In one example, the method improves the accuracy of classifying pixels of a scanned document image as pixels of a text symbol or as pixels of a halftoned non-text image. The method may be employed at various times to account for degradation in the performance of system components over time. Such improved segmentation of a scanned document image allows “sharpening” filtering to be applied only to a text symbol and allows “moiré-suppressing” filtering to be applied only to a halftoned non-text image, which together improves the quality of the scanned document image.
An embodiment of the present invention is shown in
In other embodiments, the scanner calibration strip medium may be any material which is uniform, has a high reflectance and of a neutral tone. In an alternate embodiment, the scanner calibration strip medium 12 may be the scanner calibration strip 10. In another alternate embodiment, scanner calibration strip medium 12 may be embedded in the scanner calibration strip 10. In yet another alternate embodiment, the scanner calibration strip medium 12 may be attached or other otherwise adhered to the scanner calibration strip 10.
In one enablement of the first expression of the embodiment of
In one implementation of the first expression of the embodiment of
In one application of the first expression of the embodiment of
A second expression of the embodiment of
In one employment of the second expression of the embodiment of
In one variation of the second expression of the embodiment of
A method of the present invention illustrated in
Referring to
In one illustration of the exemplary method of
In one extension of the exemplary method, wherein the scanner calibration strip 10 has a white background, a pixel of the scanned document image having a level of “whiteness” exceeding a predetermined value is considered to be a background pixel. In one arrangement, the light values range from 0 to 255, and in one example the white background of the scanner calibration strip is considered to have a light value substantially equal to 255.
Conventional tone-correction filtering algorithms and conventional scanned-image segmentation algorithms are well known in the art. In one realization of the method of
An example of a conventional tone-correction filtering algorithm is:
y=αx
1/γ+(1−α)xγ
and is implemented in a lookup table wherein y=the output value and x=the input value and a first filtering parameter α=0.1 to 0.9 in steps of 0.1 increments and a second filtering parameter γ=0.1 to 2.9 in steps of 0.2 increments. Another conventional tone-correction algorithm, may be used in alternate embodiments. One such conventional tone correction algorithm tested involved three filtering parameters and yielded results substantially equivalent to the two-filtering-parameter tone-correction algorithm described above.
An example of a conventional segmentation algorithm is as follows. A 5×5 mathematical window is slid over the scanned calibration image. Each pixel in the window is first passed through a lookup table employed by a conventional tone-correction filtering algorithm to adjust pixel intensities. Then modified edge detection is performed on the scanned calibration image as follows. If the absolute value of a particular neighboring pixel in the window minus the center (or other) pixel in the window is equal or greater than a particular value of a particular filtering parameter (e.g., a threshold value of a threshold parameter), then the center pixel is classified as a text pixel; otherwise the center pixel is classified as a halftoned image pixel. However, if the center pixel has a “whiteness” greater than a predetermined value, it is classified as a background pixel. The typical number of threshold parameters is seven with each having a limited dynamic range. A 5×5 window is found to work well for typical text sizes scanned at 600 dpi. Larger windows would be more appropriate for higher resolution scans.
To obtain the optimal adjusted filtering and segmentation parameters, an exhaustive search can be used since the number of parameters and the dynamic range of parameter adjustment is limited. Alternatively, a genetic algorithm can be used.
An example of a conventional genetic algorithm which optimizes the filtering and segmentation parameters together treats each parameter as a chromosome; concatenates the parameters to form a gene; uses a mutation rate of 0.005 and a crossover rate in the range of 0.95 to 0.99; uses 100 generations; and uses 20 genes in each generation. For each generation, a fitness function is used to determine the fitness of each gene. The fitness function is the inverse of the mean square error of how well particular values of the parameters cause the tone-correction filtering and segmentation of the scanned calibration image to match the pixel classification of the scanner calibration strip medium.
Several benefits and advantages are derived from the scanner calibration strip, the scanner and/or the method of the present invention. In one example, the method improves the accuracy of classifying pixels of a scanned document image as pixels of a text symbol or as pixels of a halftoned non-text image. The method may be employed at various times to account for degradation in performance of system components over time. Such improved segmentation of a scanned document image allows “sharpening” filtering to be applied only to a text symbol and allows “moiré-suppressing” filtering to be applied only to a halftoned non-text image which improves the quality of the scanned document image.
The foregoing description of several expressions of an embodiment and of a method of the present invention has been presented for purposes of illustration. It is not intended to be exhaustive or to limit the present invention to the precise actions and/or forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. It is intended that the scope of the present invention be defined by the claims appended hereto.