1. Background Field
Embodiments of the subject matter described herein are related generally to processing images to remove noise, and more specifically to automatically processing images having text to remove noise.
2. Relevant Background
Optical character readers (OCRs) conventionally convert images into machine-encoded text. Noise in images, however, may degrade the performance of OCR systems. Morphology techniques have been developed to remove noise, such as background texture or “speckle.” Conventional noise removal techniques are generally effective, but often modify the text in the resulting documents, sometimes making the text less readable.
The noise in an image having text is removed by convolving a shaped kernel centered on a pixel for each pixel in the image. The shaped kernel has a shape configured to identify pixels that are not part of the text. For example, the shaped kernel may be shaped with zeros in a center of the kernel and ones everywhere else to identify isolated pixels that are not part of the text. A value for the pixel is set to erase the pixel when the result of convolving the kernel with the patch of the same size around the pixel of interest is less than a threshold. The process may be repeated multiple times for differently shaped kernels, including kernels of different sizes and different configurations, such as having values greater than one in at least one of a row, column, and diagonal.
In one implementation, a method of removing noise in an image that includes text includes receiving the image that includes the text; convolving a shaped kernel centered on each of a plurality of subsets of pixels in the image to produce a convolution value for each subset of pixels in the plurality of subsets of pixels, the shaped kernel having a shape configured to identify subsets of pixels that are not part of the text; setting a value to erase a subset of pixels when the convolution value for the subset of pixels is less than a threshold to generate a filtered image; and producing the filtered image.
In one implementation, an apparatus to remove noise in an image that includes text, includes an image interface to receive the image that includes the text; and a processor coupled to receive the image, the processor being configured to convolve a shaped kernel centered on each of a plurality of subset of pixels in the image to produce a convolution value for each subset of pixels in the plurality of subsets of pixels, the shaped kernel having a shape configured to identify subsets of pixels that are not part of the text; and set a value to erase a subset of pixels when the convolution value for the subset of pixels is less than a threshold to generate a filtered image, and to produce the filtered image.
In one implementation, an apparatus to remove noise in an image that includes text includes means for receiving the image that includes the text; means for convolving a shaped kernel centered on each of a plurality of subsets of pixels in the image to produce a convolution value for each subset of pixels in the plurality of subsets of pixels, the shaped kernel having a shape configured to identify subsets of pixels that are not part of the text; means for setting a value to erase a subset of pixels when the convolution value for the subset of pixels is less than a threshold to generate a filtered image; and means for producing the filtered image.
In one implementation, a storage medium including program code stored thereon includes program code to program code to receive an image that includes text; program code to convolve a shaped kernel centered on each of a plurality of subsets of pixels in the image to produce a convolution value for each subset of pixels in the plurality of subsets of pixels, the shaped kernel having a shape configured to identify subsets of pixels that are not part of the text; program code to set a value to erase a subset of pixels when the convolution value for the subset of pixels is less than a threshold to generate a filtered image; and program code to produce the filtered image.
The noise reduction techniques described herein may be used on optically scanned images having text. The scanned images may be obtained with a desktop scanner, hand-held scanner, digital camera, or any other manner in which text is converted to a digital image. The noise, e.g., speckle, in the image may be reduced by identifying and removing pixels “isolated” from the text in the image, while leaving the pixels that are part of the text. Thus, pixels that are part of letters or punctuation in the text are not affected. Accordingly, the noise reduction techniques described herein reduce noise but do not adversely affect the text that is contained in the image.
where T(x,y) is the weighted sum (e.g., Gaussian window) of a neighborhood of (x,y). The dst image is used as a mask for copying the src image to the scanned image. Close morphology operation may be performed on the resulting scanned image to reduce noise and open morphology operation may be performed to connect nearby regions. Noise reduction may then be performed to remove “isolated” patches of noise.
To perform the noise reduction, for each subset of pixels in the scanned image, a shaped kernel centered on the subset of pixels is convolved to produce a convolution value for the pixel (104). The shaped kernel has a shape configured to identify subsets of pixels that are not part of the text. For example, the shaped kernel may be shaped with zeros in the center of the kernel to identify whether a subset of pixels at the center of the shaped kernel is unlikely to be part of the text. The subset of pixels may be a single pixel or may include more than one pixel. For the sake of simplicity, the subset of pixels will be assumed herein to be a single pixel, and thus the subset of pixels will be sometimes referred to as a pixel or the pixel. Nevertheless, it should be understood that if desired, the subset of pixels may include more than one pixel.
s(x,y)=Σ(x′,y′)εpatch(x,y)I(x′,y′)K(x′,y′) eq. 2
A pixel that is too near to an edge of the image to be centered with the shaped kernel may be ignored or the edge of the image may be extended by replicating the pixels along the edge or by extending the edge with a constant intensity so that the shaped kernel may be applied to the desired pixel. A convolution value for the pixel may then be produced in the same manner discussed above.
A value for the subset of pixels is set to erase the subset of pixels when the convolution value for the subset of pixels is less than a threshold (106) to generate a filtered image. In other words, if s(x,y)<threshold, pixels of the image at the center of the kernel are set to be erased, e.g., white if the scanned image has a white background or black if the scanned image has a black background. The threshold may be selected based on an average intensity for the image or in any other desired manner and may be determined empirically.
The process may be repeated for the image using one or more shaped kernels with different dimensions. For example, a shaped kernel that is 3×3 with a 0 value in the center may be used in addition to the 8×8 kernel with 0's in the 2×2 center region. Thus, as illustrated in
When all the shape kernels have been used (108), the resulting filtered image may be produced (112). For example, the filter image may be produced by a computer, e.g., by displaying the resulting image, printing the resulting image, or storing the resulting image in memory.
Identifying and removing isolated pixels in a scanned image with text using shaped kernels is advantageous as it requires only a few simple shaped kernels that may be used for many applications. Contrarily, identifying and retaining pixels that are part of the text in a scanned image may require many different shaped kernels to identify different possible text shapes and, thus, adequately reducing noise may be problematic for various applications where text can vary from a uniform shape, e.g., as with handwriting and logos, various fonts, etc.
The apparatus 200 also includes a control unit 280 that is connected to and communicates with the scanned image interface 202. The control unit 280 may be provided by a bus 280b, processor 281 and associated memory 284, hardware 282, software 285, and firmware 283. The control unit 280 may include morphology unit 292, a shaped kernel production unit 294, a convolution unit 296 and a thresholding unit 298, which operate as discussed above.
The various processing units, e.g., morphology unit 292, shaped kernel production unit 294, convolution unit 296 and thresholding unit 298, are illustrated separately from processor 281 for clarity, but may be part of the processor 281 or implemented in the processor based on instructions in the software 285 which is run in the processor 281. It will be understood as used herein that the processor 281, and/or the various processing units, can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of computer storage medium, including long term, short term, or other memory associated with the apparatus, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware 282, firmware 183, software 285, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software codes may be stored in memory 284 and executed by the processor 281. Memory may be implemented within or external to the processor 281. If implemented in firmware and/or software, the functions may be stored as one or more instructions or code on a storage medium that is computer-readable, wherein the storage medium does not include transitory propagating signals. Examples include storage media encoded with a data structure and storage media encoded with a computer program. Storage media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Thus, an apparatus to remove noise in an image that includes text may include means for receiving the image that includes the text which may be, e.g., the scanned image interface 202. A means for convolving a shaped kernel centered on each of a plurality of subsets of pixels in the image to produce a convolution value for each subset of pixels in the plurality of subsets of pixels, the shaped kernel having a shape configured to identify subsets of pixels that are not part of the text may be e.g., the convolution unit 296 with the shaped kernel production unit 294 or processor 281 performing instructions received from software 285. Means for means for setting a value to erase a subset of pixels when the convolution value for the subset of pixels is less than a threshold to generate a filtered image may be, e.g., the thresholding unit 298 or processor 281 performing instructions received from software 285. Means for producing the filtered image may be, e.g., the scanned image interface 202 or processor 281 causing the filtered image to be stored in memory 284. Means for convolving a second shaped kernel centered on each of the plurality of subsets of pixels to produce a second convolution value for each subset of pixels, the second shaped kernel having a shape configured to assign higher weights to subsets of pixels that are likely to be part of the alphabet used in the text or other shapes in the image that are desired to be preserved may be, e.g., the convolution unit 296 with the shaped kernel production unit 294 or processor 281 performing instructions received from software 285. Means for repeatedly convolving different shaped kernels centered on each subset of pixels to produce a plurality of convolution values and means for setting the value to erase the subset of pixels when any the plurality of convolution values is less than any threshold associated with any of the different shaped kernels may be, e.g., the convolution unit 296 with the shaped kernel production unit 294 and the thresholding unit 298 or processor 281 performing instructions received from software 285.
By way of illustration,
Although the present invention is illustrated in connection with specific embodiments for instructional purposes, the present invention is not limited thereto. Various adaptations and modifications may be made without departing from the scope of the invention. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.
This application claims priority under 35 USC 119 to U.S. Provisional Application No. 61/642,318, filed May 3, 2012, entitled “Noise Removal From Images Containing Text,” which is assigned to the assignee hereof and which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4982294 | Morton et al. | Jan 1991 | A |
5272545 | Mita | Dec 1993 | A |
5351314 | Vaezi | Sep 1994 | A |
5647027 | Burges et al. | Jul 1997 | A |
6885477 | Karidi et al. | Apr 2005 | B2 |
6947178 | Kuo et al. | Sep 2005 | B2 |
7899258 | Liu et al. | Mar 2011 | B2 |
20050276504 | Chui et al. | Dec 2005 | A1 |
20070217701 | Liu et al. | Sep 2007 | A1 |
20080068660 | Loce et al. | Mar 2008 | A1 |
20080291295 | Kato et al. | Nov 2008 | A1 |
20100111400 | Ramirez et al. | May 2010 | A1 |
20100238354 | Shmueli et al. | Sep 2010 | A1 |
20110222769 | Galic et al. | Sep 2011 | A1 |
20110249905 | Singh et al. | Oct 2011 | A1 |
20140140635 | Wu et al. | May 2014 | A1 |
Number | Date | Country |
---|---|---|
8901205 | Feb 1989 | WO |
Entry |
---|
Tomas Svoboda, et al. “Simple spatial operations”, Image Processing, Analysis and Machine Vision . . . publish on Aug. 31, 2007.chapter 5 pp. 46. |
International Search Report and Written Opinion—PCT/US2013/035330—ISA/EPO—Jun. 5, 2013. |
Number | Date | Country | |
---|---|---|---|
20130294693 A1 | Nov 2013 | US |
Number | Date | Country | |
---|---|---|---|
61642318 | May 2012 | US |