A document, such as a scanned document, may include a plurality of words, phrases, objects and/or symbols. Additionally, various actions may be performed on the document. The document may be analyzed such that specific words, phrases, objects and/or symbols may be identified.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements. The figures are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover the drawings provide examples and/or implementations consistent with the description; however, the description is not limited to the examples and/or implementations provided in the drawings.
Most printers, scanners, cameras, and/or multi-function-print devices (MFPs) provide several features, such as an option to scan a physical document, which may be controlled via an on-device control panel, a connected application, and/or a remote service. Other options may include printing, copying, faxing, document assembly, etc. The scanning portion of an MFP may comprise an optical assembly located within a sealed enclosure. The sealed enclosure may have a scan window through which the optical assembly can scan a document, which may be placed on a flatbed and/or delivered by a sheet feeder mechanism.
Such devices may interact with a document in numerous ways, such as printing, copying, scanning, faxing, etc. Such a document may include a plurality of sections for user input and/or user action. For instance, a document may include an object or a plurality of objects indicating a location for a signature, a location for an initial, and/or a location for other user input, images, charts, schematics, and/or background colors and/or patterns, among other examples.
In some circumstances, a scanner uses a predefined scan resolution, whether user-selected or default in a scanner's settings. Once the scanning process begins, the resolution cannot be changed, resulting in all scanned pages, no matter how much their contents may vary, having output images of the same size and resolution. For example, if one page has a large area of empty space, the empty space may have the same resolution as a text or drawing area where a higher resolution is needed to preserve the details. This sometimes results in memory waste, since the empty area doesn't need to preserve that high resolution.
In some implementations consistent with this disclosure, different scan resolutions for different contents in a digital image file may be decided. Higher scan resolutions may keep more details and yield higher-quality images while the quality of scanned images degrades with a lower scan resolution. By segmenting a scanned image into regions of interest, a minimum scanning resolution for each region may be determined, and the resulting file size of the scanned image may be optimized. Consistent with implementations described herein, scanning may refer to an original capture of a document file, such as via an optical scanner and/or camera, and/or rescanning, in a memory of a computing device, of an existing digital file.
As a first stage, a document may be scanned at a fixed resolution prior to segmentation. This initial scanned image may be used to calculate features for each pixel, forming a plurality of feature maps, each corresponding to one particular feature, such as average gray-scale image, average gradient, and/or average saturation. An average gray-scale image may comprise a gaussian weighted average of all pixel values in a 5×5 window in a converted gray-scale version of the input image, such as by applying a 5×5 Gaussian filter to the gray-scale image obtained from the original color input image. An average gradient may comprise a gaussian weighted average of the absolute value of difference between every pair of neighboring pixels in a 5×5 window of the gray-scale image. An average saturation feature may comprise, for each pixel in a 5×5 window, a calculation of the saturation values in the hue, saturation, value (HSV) and/or hue, saturation, lightness (HSL) color system.
These feature maps may be converted into masks in order to classify pixels of the input image into content areas, such as text, non-text, and/or background areas. Bounding boxes may be drawn around pixels of the same content area type for processing and scanning into a final file at optimal resolutions.
In some implementations, an average gray-scale image may be an effective feature map for differentiation background areas in a scanned document, as dark-color pixels can be a good indication for main content or non-background pixels. Since a raw scanned image can suffer from scattered noise pixels, the 5×5 Gaussian filter may be applied to the gray-scale image in order to remove noise.
In some implementations, the gradient for a pixel may be calculated as the average absolute value of differences between that pixel and its direct neighbors. Average gradient then means the Gaussian weighted average of all the gradients inside a window (e.g., a 5×5 pixel area). Average gradient may have a greater value in a window where pixel values are alternating, which is often the case in text areas. Therefore, greater feature values will be seen in text windows while smaller feature values will be seen in background areas or areas of solid color.
In some implementations, achromatic colors will have small saturation values. As a result, gray and/or black pixels may have a small saturation. On the other hand, for a chromatic color pixel, their saturation values may be greater. Therefore, because text pixels are usually gray and/or black, a greater saturation often indicates pixels in a non-text area.
Once the pixels are classified by content area, pixels of the same content type may be grouped together within bounding boxes. These bounded areas may then be analyzed to determine the minimum scanning resolution at which readability and/or clarity may be maintained. Readability may be considered as the ability for a human to read the scanned image without difficulty and/or for a machine to accurately read the scanned image, such as via optical character recognition (OCR). For a non-text area such as an image, readability may be considered as preserving details of the image contents without introducing undue interference or other blurriness.
Once the content areas have been scanned at different resolutions, whether by performing an optical scan of a document at different resolutions and/or resampling an image at different resolutions, the images at two resolutions may be compared to evaluate differences between them. In some implementations, differences that affect readability may comprise merging pixels, breaking pixels, and/or filling pixels. Merging pixels may comprise pixels on different strokes that become connected or overlapped with each other as resolution decreases. Breaking pixels may comprise pixels on the same stroke of a glyph that disappear and result in a breaking stroke as resolution decreases. Filling pixels may comprise pixels that emerge and fill holes (e.g., background regions surrounded by foreground pixels) as resolution decreases. The numbers of the various kinds of differentiating pixels may be counted and used as a measure of degraded readability at lower resolutions. Once the measure reaches a threshold value, a determination may be made that the corresponding resolution is below the minimum to maintain readability and the next highest resolution may be determined to be the minimum scanning resolution. In some implementations, a machine-learning trained support vector machine (SVM) may be used to evaluate the comparisons between resolutions.
Processor 212 may comprise a central processing unit (CPU), a semiconductor-based microprocessor, a programmable component such as a complex programmable logic device (CPLD) and/or field-programmable gate array (FPGA), or any other hardware device suitable for retrieval and execution of instructions stored in machine-readable storage medium 214. In particular, processor 212 may fetch, decode, and execute instructions 220, 225, 230.
Executable instructions 220, 225, 230 may comprise logic stored in any portion and/or component of machine-readable storage medium 214 and executable by processor 212. The machine-readable storage medium 214 may comprise both volatile and/or nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power.
The machine-readable storage medium 214 may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, and/or a combination of any two and/or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), and/or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), and/or other like memory device.
Identify content area instructions 220 may identify a plurality of content areas of a document to be scanned.
Determine minimum scanning resolution instructions 225 may, for each of the plurality of content areas, determine a minimum scanning resolution to maintain readability.
In some implementations, the instructions 225 to identify the plurality of content areas of the document to be scanned may comprise instructions to classify each of the plurality of content areas as at least one of the following: a text and/or symbol area, a non-text and/or raster area, and a background and/or vector area. The instructions to classify each of the plurality of content area may comprise instructions to extract a feature vector for each pixel of the document to be scanned such as by calculating an average feature vector for a window centered on each of a plurality of pixels associated with the document to be scanned. For example, the average feature vector may comprise at least one of the following: an averaged gray-scale value, an average gradient value, and an average saturation value.
In some implementations, instructions 225 to identify the plurality of content areas of the document to be scanned may comprise instructions to apply a bounding mask to each of the plurality of content areas. For example, the instructions to determine the minimum scanning resolution to maintain readability comprise instructions to digitally scan at least one of the plurality of content areas at a plurality of resolutions and determine the lowest of the plurality of resolutions that maintains readability. In some implementations, the instructions to determine the minimum scanning resolution to maintain readability may comprise instructions to evaluate at least one interaction between pixels in the at least one of the plurality of content areas, such as a number of merging pixels, breaking pixels, and/or filling pixels.
Perform scan instructions 230 may, perform a scan of the document to a digital file, wherein each of the plurality of content areas is scanned at least at the determined minimum scanning resolution to maintain readability of the respective content area.
Method 300 may begin at stage 305 and advance to stage 310 where device 210 may identify a plurality of content areas of a document to be scanned.
Method 300 may then advance to stage 315 where computing device 210 may classify each of the plurality of content areas into a content type. For example, the content type may comprise at least one of the following: a text area, a non-text area, and a background area.
Method 300 may then advance to stage 320 where computing device 210 may determine a minimum scanning resolution to maintain readability for each of the plurality of content areas according to the classified content type. In some implementations, determining the minimum scanning resolution to maintain readability for each of the plurality of content areas may comprises comparing at least one interaction between pixels at different scanning resolutions in the at least one of the plurality of content areas. Such an interaction between pixels may comprise, for example, a merging pixel, a breaking pixel, and a filling pixel. For example, computing device 210 may determine whether a number of breaking pixels within a content area at one resolution (e.g., 150 DPI) is over a threshold amount that indicates loss of readability relative to the same content area is a higher resolution (e.g., 300 DPI).
In some implementations, comparing the at least one interaction between pixels at different scanning resolutions in the at least one of the plurality of content areas comprises, for a non-text area, may comprise calculating a tiled structural similarity index. Calculating a structural similarity index (SSIM) may comprise extracting feature vectors from the scanned area at different resolutions. Such feature vectors may comprise, for example, luminance, contrast, structure, etc. To accurately compare two images at different resolutions, tiles of differing pixel areas may be used across each image, such as using a 12×12 pixel tile for images scanned at 300 DPI, using a 8×8 pixel tile for images scanned at 200 DPI, using a 6×6 pixel tile for images scanned at 150 DPI, using a 4×4 pixel tile for images scanned at 100 DPI, and/or using a 3×3 pixel tile for images scanned at 75 DPI. Other tile sizes and scanning resolutions may be contemplated and used, consistent with implementations described herein. In some implementations, maximum and minimum grayscale values may comprise features extracted for each tile and/or a standard deviation between greyscale values within the pixels of a tile may be extracted. A mean and/or standard deviation between corresponding tiles at different resolutions may then be used to calculate the similarity index.
Once the tiles have been generated for each resolution, a number of horizontal and/or vertical transitions—changes between foreground text, imagery, background, and/or edge pixels—may be calculated for each set of corresponding tiles and/or across the entire page. The average change between the number of horizontal and/or vertical transitions across corresponding tiles in different scanning resolutions may be calculated, and if that average exceeds a threshold value, the content area may be deemed unreadable at the lower resolution value. In some implementations, a machine-learning trained support vector machine (SVM) may be used to evaluate the comparisons between resolutions and/or SSIMs.
Method 300 may then advance to stage 325 where computing device 210 may perform a scan of the document to a digital file, wherein each of the plurality of content areas is scanned at least at the determined minimum scanning resolution to maintain readability of the respective content area. For example, a document may be scanned to an image scanner with text-based content areas at 300 DPI, non-text content areas at 150 DPI, and background areas at 75 DPI. In some implementations, different content areas of the same type may be scanned and/or resampled at different resolutions, such as having some text-based content areas scanned at 300 DPI while other text-based content areas may be scanned at 150 DPI.
Method 300 may then end at stage 350.
Each of engines 420, 425, 430 may comprise any combination of hardware and programming to implement the functionalities of the respective engine. In examples described herein, such combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the engines may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the engines may include a processing resource to execute those instructions. In such examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement engines 320, 325. In such examples, device 302 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to apparatus 300 and the processing resource.
Content engine 420 may extract a plurality of features from a document to be scanned and identify a plurality of content areas in the document according to the extracted plurality of features.
Resolution engine 425 may determine a minimum scanning resolution to maintain readability for each of the plurality of content areas.
Scanning engine 430 may perform a scan of the document to a digital file, wherein each of the plurality of content areas is scanned at least at the determined minimum scanning resolution to maintain readability of the respective content area.
In the foregoing detailed description of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to allow those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of the present disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/014776 | 1/23/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/150226 | 7/29/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5600574 | Reitan | Feb 1997 | A |
5867277 | Melen | Feb 1999 | A |
5956468 | Ancin | Sep 1999 | A |
6014464 | Kurzweil | Jan 2000 | A |
6160249 | Webb | Dec 2000 | A |
6377758 | OuYang et al. | Apr 2002 | B1 |
6587583 | Kurzweil | Jul 2003 | B1 |
7010176 | Kusunoki | Mar 2006 | B2 |
7376269 | Klassen et al. | May 2008 | B2 |
7672022 | Fan | Mar 2010 | B1 |
7782339 | Hobbs | Aug 2010 | B1 |
7783122 | Wu et al. | Aug 2010 | B2 |
8103104 | Fan | Jan 2012 | B2 |
8208183 | Wu et al. | Jun 2012 | B2 |
8423928 | Ghan et al. | Apr 2013 | B2 |
8654398 | Dewancker | Feb 2014 | B2 |
8867796 | Vans et al. | Oct 2014 | B2 |
8913285 | Neubrand | Dec 2014 | B1 |
8928950 | Kuraya | Jan 2015 | B2 |
9208550 | Chen | Dec 2015 | B2 |
9237255 | Gopalkrishnan | Jan 2016 | B1 |
9778206 | Honda et al. | Oct 2017 | B2 |
20030031363 | Liu | Feb 2003 | A1 |
20030048487 | Johnston | Mar 2003 | A1 |
20030184776 | Ishizuka | Oct 2003 | A1 |
20040012815 | Fuchigami | Jan 2004 | A1 |
20040223643 | Irwin | Nov 2004 | A1 |
20050062991 | Fujishige | Mar 2005 | A1 |
20050163374 | Ferman | Jul 2005 | A1 |
20070064279 | Nishida | Mar 2007 | A1 |
20070189615 | Liu | Aug 2007 | A1 |
20080310715 | Simske | Dec 2008 | A1 |
20090067005 | Chelvayohan | Mar 2009 | A1 |
20090148042 | Fan | Jun 2009 | A1 |
20090201541 | Neogi | Aug 2009 | A1 |
20100091312 | Edwards | Apr 2010 | A1 |
20120212788 | Miyazaki | Aug 2012 | A1 |
20120250101 | Kuraya | Oct 2012 | A1 |
20140184852 | Niemi | Jul 2014 | A1 |
20160004921 | McCarthy | Jan 2016 | A1 |
20160210523 | Abdollahian | Jul 2016 | A1 |
20180295355 | Flanagan et al. | Oct 2018 | A1 |
20190213434 | Zamfir et al. | Jul 2019 | A1 |
20190228222 | Nepomniachtchi et al. | Jul 2019 | A1 |
20220311872 | Kikuta et al. | Sep 2022 | A1 |
Number | Date | Country |
---|---|---|
2493170 | Aug 2012 | EP |
2009296533 | Dec 2009 | JP |
2010056827 | Mar 2010 | JP |
2016060190 | Apr 2016 | JP |
20140063378 | May 2014 | KR |
WO-2007146400 | Dec 2007 | WO |
WO-2019172918 | Sep 2019 | WO |
2021150231 | Jul 2021 | WO |
2022154787 | Jul 2022 | WO |
Entry |
---|
Sindhuri, M.S. et al., “Text Separation in Document Images through Otsu's Method”, IEEE WiSPNET 2016 conference, 2395-2399. |
Xiao, Z. et al., “Digital Image Segmentation for Object-Oriented Halftoning”, IS&T International Symposium on Electronic Imaging 2016, 7 pp. |
Number | Date | Country | |
---|---|---|---|
20220407978 A1 | Dec 2022 | US |