Methods and systems for detecting pictorial regions in digital images

Information

  • Patent Grant
  • 8630498
  • Patent Number
    8,630,498
  • Date Filed
    Thursday, June 15, 2006
    18 years ago
  • Date Issued
    Tuesday, January 14, 2014
    10 years ago
  • CPC
    • H04N7/50
  • US Classifications
    Field of Search
    • US
    • 382 232000
  • International Classifications
    • G06K9/36
    • H04N7/50
    • Term Extension
      703
Abstract
Embodiments of the present invention comprise systems, methods and devices for detection of pictorial regions in an image using a masking condition, an entropy measure, and region growing.
Description
FIELD OF THE INVENTION

Embodiments of the present invention comprise methods and systems for detecting pictorial regions in digital images.


BACKGROUND

The content of a digital image can have considerable impact on the compression of the digital image, both in terms of compression efficiency and compression artifacts. Pictorial regions in an image are not efficiently compressed using compression algorithms designed for the compression of text. Similarly, text images are not efficiently compressed using compression algorithms that are designed and optimized for pictorial content. Not only is compression efficiency affected when a compression algorithm designed for one type of image content is used on a different type of image content, but the decoded image may exhibit visible compression artifacts.


Further, image enhancement algorithms designed to sharpen text, if applied to pictorial image content, may produce visually annoying artifacts in some areas of the pictorial content. In particular, pictorial regions containing strong edges may be affected. While smoothing operations may enhance a natural image, the smoothing of text regions is seldom desirable.


The detection of regions of a particular content type in a digital image can improve compression efficiency, reduce compression artifacts, and improve image quality when used in conjunction with a compression algorithm or image enhancement algorithm designed for the particular type of content.


The semantic labeling of image regions based on content is also useful in document management systems and image databases.


Reliable and efficient detection of regions of pictorial content and other image regions in digital images is desirable.


SUMMARY

Embodiments of the present invention comprise methods and systems for identifying pictorial regions in a digital image using a masked entropy feature and region growing.


The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.





BRIEF DESCRIPTION OF THE SEVERAL DRAWINGS


FIG. 1 is an example of an image comprising a multiplicity of regions of different content type;



FIG. 2 is a diagram of an exemplary region-detection system (prior art);



FIG. 3 is an exemplary histogram showing feature value separation;



FIG. 4 is an exemplary histogram showing feature value separation;



FIG. 5 is a diagram showing exemplary embodiments of the present invention comprising a masked-entropy calculation from a histogram;



FIG. 6 is a diagram showing an exemplary embodiment of masked-image generation;



FIG. 7 is a diagram showing an exemplary embodiment of histogram generation;



FIG. 8 is a diagram showing exemplary embodiments of the present invention comprising masking, quantization, histogram generation and entropy calculation;



FIG. 9 is a diagram showing exemplary embodiments of the present invention comprising multiple quantization of select data and multiple entropy calculations;



FIG. 10 is a diagram showing exemplary embodiments of the present invention comprising multiple quantization of select data;



FIG. 11 is diagram showing pixel classification comprising an image window;



FIG. 12 is a diagram showing block classification comprising an image window;



FIG. 13 is a diagram showing exemplary embodiments of the present invention comprising lobe-based histogram modification;



FIG. 14 is a diagram showing exemplary embodiments of the present invention comprising pixel selection logic using multiple mask input;



FIG. 15 is a diagram showing exemplary embodiments of the present invention comprising a masked-entropy calculation from a histogram using confidence levels;



FIG. 16 is a diagram showing an exemplary embodiment of masked-image generation using confidence levels;



FIG. 17 is a diagram showing an exemplary embodiment of histogram generation using confidence levels;



FIG. 18 is a diagram showing exemplary embodiments of the present invention comprising refinement and verification;



FIG. 19 is a diagram showing exemplary embodiments of the present invention comprising region growing from pictorial-region seeds; and



FIG. 20 shows an exemplary pictorial region.





DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Embodiments of the present invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The figures listed above are expressly incorporated as part of this detailed description.


It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the methods and systems of the present invention is not intended to limit the scope of the invention but it is merely representative of the presently preferred embodiments of the invention.


Elements of embodiments of the present invention may be embodied in hardware, firmware and/or software. While exemplary embodiments revealed herein may only describe one of these forms, it is to be understood that one skilled in the art would be able to effectuate these elements in any of these forms while resting within the scope of the present invention.



FIG. 1 shows an image 10 comprising three regions: a pictorial region 12, a text region 14, and a graphics region 16. For many image processing, compression, document management, and other applications, it may be desirable to detect various regions in an image. Exemplary regions may include: a pictorial region, a text region, a graphics region, a half-tone region, a continuous-tone region, a color region, a black-and-white region, a region best compressed by Joint Photographic Experts Group (JPEG) compression, a region best compressed by Joint Bi-level Image Experts Group (JBIG) compression, a background region, and a foreground region.


An exemplary region-detection system 20 is shown in FIG. 2. A region-detection system 20 may include a feature extractor 22 and a classifier 24. The feature extractor 22 may measure, calculate, or in some way extract, a feature or features 23 from a digital image 21. The classifier 24 may classify portions of the image 21 based on the extracted feature or features 23. The classification 25 produced by the classifier 24 thereby provides detection of image regions and segmentation of the digital image 21.


The effectiveness and reliability of a region-detection system may depend on the feature or features used for the classification. FIG. 3 shows an example of normalized frequency-of-occurrence plots of the values of a feature for two different image regions. The solid line 32 shows the frequency of occurrence of feature values extracted from image samples belonging to one region. The dashed line 34 shows the frequency of occurrence of feature values extracted from image samples belonging to a second region. The strong overlap of these two curves may indicate that the feature may not be an effective feature for separating image samples belonging to one of these two regions.



FIG. 4 shows another example of normalized frequency-of-occurrence plots of the values of a feature for two different image regions. The solid line 42 shows the frequency of occurrence of feature values extracted from image samples belonging to one region. The dashed line 44 shows the frequency of occurrence of feature values extracted from image samples belonging to a second region. The wide separation of these two curves may indicate that the feature will be an effective feature for classifying image samples as belonging to one of these two regions.


For the purposes of this specification, associated claims, and included drawings, the term histogram will be used to refer to frequency-of-occurrence information in any form or format, for example, that represented as an array, a plot, a linked list and any other data structure associating a frequency-of-occurrence count of a value, or group of values, with the value, or group of values. The value, or group of values, may be related to an image characteristic, for example, color (luminance or chrominance), edge intensity, edge direction, texture, and any other image characteristic.


Embodiments of the present invention comprise methods and systems for region detection in a digital image. Some embodiments of the present invention comprise methods and systems for region detection in a digital image wherein the separation between feature values corresponding to image regions may be accomplished by masking, prior to feature extraction, pixels in the image for which a masking condition is met. In some embodiments, the masked pixel values may not be used when extracting the feature value from the image.


In some exemplary embodiments of the present invention shown in FIG. 5, a masked image 51 may be formed 52 from an input image 50. The masked image 51 may be formed 52 by checking a masking condition at each pixel in the input image 50. An exemplary embodiment shown in FIG. 6 illustrates the formation of the masked image. If an input-image pixel 60 satisfies 62 the masking condition, the value of the pixel at the corresponding location in the masked image may be assigned 66 a value, which may be called a mask-pixel value, indicating that the masking condition is satisfied at that pixel location in the input image. If an input-image pixel 60 does not satisfy 64 the masking condition, the value of the pixel at the corresponding location in the masked image may be assigned the value of the input pixel in the input image 68. The masked image thereby masks pixels in the input image for which a masking condition is satisfied.


In the exemplary embodiments of the present invention shown in FIG. 5, after forming 52 the masked image 51, a histogram 53 may be generated 54 for a block, also considered a segment, section, or any division, not necessarily rectangular in shape, of the masked image 51. For the purposes of this specification, associated claims, and included drawings, the term block will be used to describe a portion of data of any shape including, but not limited to, square, rectangular, circular, elliptical, or approximately circular.



FIG. 7 shows an exemplary embodiment of histogram formation 54. A histogram with bins corresponding to the possible pixel values of the masked image may be formed according to FIG. 7. In some embodiments, all bins may be initially considered empty with initial count zero. The value of a pixel 70 in the block of the masked image may be compared 71 to the mask-pixel value. If the value of the pixel 70 is equal 72 to the mask-pixel value, then the pixel is not accumulated in the histogram, meaning that no histogram bin is incremented, and if there are pixels remaining in the block to examine 76, then the next pixel in the block is examined 71. If the value of the pixel 70 is not equal 73 to the mask-pixel value, then the pixel is accumulated in the histogram 74, meaning that the histogram bin corresponding to the value of the pixel is incremented, and if there are pixels remaining in the block to examine 77, then the next pixel is examined 71.


When a pixel is accumulated in the histogram 74, a counter for counting the number of non-mask pixels in the block of the masked image may be incremented 75. When all pixels in a block have been examined 78, 79, the histogram may be normalized 69. The histogram may be normalized 69 by dividing each bin count by the number of non-mask pixels in the block of the masked image. In alternate embodiments, the histogram may not be normalized and the counter may not be present.


Alternately, the masked image may be represented in two components: a first component that is a binary image, also considered a mask, in which masked pixels may be represented by one of the bit values and unmasked pixels by the other bit value, and a second component that is the digital image. The logical combination of the mask and the digital image forms the masked image. The histogram formation may be accomplished using the two components of the masked image in combination.


An entropy measure 55 may be calculated 56 for the histogram 53 of a block of the masked image. The entropy measure 55 may be considered an image feature of the input image. The entropy measure 55 may be considered any measure of the form:







-




i
=
1

N




h


(
i
)


*

f


(

h


(
i
)


)





,





where N is the number of histogram bins, h(i) is the accumulation or count of bin i, and f(•) may be a function with mathematical characteristics similar to a logarithmic function. The entropy measure 55 may be weighted by the proportion of pixels that would have been counted in a bin, but were masked. The entropy measure is of the form:






-




i
=
1

N




w


(
i
)




h


(
i
)


*

f


(

h


(
i
)


)









where w(i) is the weighting function. In some embodiments of the present invention, the function f(h(i)) may be log2(h(i)).


In the embodiments of the present invention shown in FIG. 5, after calculating 56 the entropy measure 55 for the histogram 53 corresponding to a block of the image centered at a pixel, the pixel may be classified 57 according to the entropy feature 55. In some embodiments, the classifier 57 may be based on thresholding. A threshold may be determined a priori, adaptively, or by any of numerous methods. The pixel may be classified 57 as belonging to one of two regions depending on which side of the threshold the entropy measure 55 falls.


In some embodiments of the present invention shown in FIG. 8, a digital image 80 and a corresponding mask image 81 may be combined 82 to form masked data 83. The masked data 83 may be quantized 84 forming quantized, masked data 85. The histogram 87 of the quantized, masked data 85 may be generated 86, and an entropy measure 89 may be calculated 88 using the histogram of the quantized, masked data 87. The computational expense of the histogram generation 86 and the entropy calculation 88 may depend on the level, or degree, of quantization of the masked data. The number of histogram bins may depend of the number of quantization levels, and the number of histogram bins may influence the computational expense of the histogram generation 86 and the entropy calculation 88. Due to scanning noise and other factors, uniform areas in a document may not correspond to a single color value in a digital image of the document. In some embodiments of the present invention shown in FIG. 8, the degree of quantization may be related to the expected amount of noise for a uniformly colored area on the document. In some embodiments, the quantization may be uniform. In alternate embodiments, the quantization may be variable. In some embodiments, the quantization may be related to a power of two. In some embodiments in which the quantization is related to a power of two, quantization may be implemented using shifting.


In some embodiments of the present invention, the masked data may not be quantized, but the number of histogram bins may be less than the number of possible masked data values. In these embodiments, a bin in the histogram may represent a range of masked data values.


In some embodiments of the present invention shown in FIG. 9, quantization 90, 91, histogram generation 92, and calculation of entropy 94 may be performed multiple times on the masked data 83 formed by the combination 82 of the digital image 80 and the corresponding mask image 81. The masked data may be quantized using different quantization methods 90, 91. In some embodiments, the different quantization methods may correspond to different levels of quantization. In some embodiments, the different quantization methods may be of the same level of quantization with histogram bin boundaries shifted. In some embodiments, the histogram bin boundaries may be shifted by one-half of a bin width. A histogram may be generated 92 from the data produced by each quantization method 90, 91, and an entropy calculation 94 may be made for each histogram. The multiple entropy measures produced may be combined 96 to form a single measure 97. The single entropy measure may be the average, the maximum, the minimum, a measure of the variance, or any other combination of the multiple entropy measures.


In alternate embodiments of the present invention shown in FIG. 10, data 83 formed by the combination 82 of the digital image 80 and the corresponding mask image 81 may be quantized using different quantization methods 90, 91. Multiple histograms 100, 101 may be formed 92 based on multiple quantizations 102, 103. One histogram 106 from the multiple histograms 100, 101 may be selected 104 for the entropy calculation 105. In some embodiments, the entropy calculation may be made using the histogram with the largest single-bin count. In alternate embodiments, the histogram with the largest single lobe may be used.


In some embodiments of the present invention, a moving window of pixel values centered, in turn, on each pixel of the image, may be used to calculate the entropy measure for the block containing the centered pixel. The entropy may be calculated from the corresponding block in the masked image. The entropy value may be used to classify the pixel at the location on which the moving window is centered. FIG. 11 shows an exemplary embodiment in which a block of pixels is used to measure the entropy feature which is used to classify a single pixel in the block. In FIG. 11, a block 111 is shown for an image 110. The pixels in the masked image in the block 111 may be used to calculate the entropy measure, which may be considered the entropy measure at pixel 112. The pixel in the center of the block 112 may be classified according the entropy measure.


In other embodiments of the present invention, the entropy value may be calculated for a block of the image, and all pixels in the block may be classified with the same classification based on the entropy value. FIG. 12 shows an exemplary embodiment in which a block of pixels is used to measure the entropy feature which is used to classify all pixels in the block. In FIG. 12, a block 121 is shown for an image 120. The pixels in the masked image in the corresponding block may be used to calculate the entropy measure. All pixels 122 in the block 121 may be classified according to the entropy measure.


In some embodiments of the present invention shown in FIG. 13, the entropy may be calculated considering select lobes, also considered peaks, of the histogram. A digital image 80 and a corresponding mask image 81 may be combined 82 to form masked data 83. The masked data 83 may be quantized 84 forming quantized, masked data 85. The histogram 87 of the quantized, masked data 85 may be generated 86, a modified histogram 131 may be generated 130 to consider select lobes of the histogram 87, and an entropy measure 133 may be calculated 132 using the modified histogram of the quantized, masked data 131. In some embodiments, a single lobe of the histogram 87 may be considered. In some embodiments, the single lobe may be the lobe containing the image value of the center pixel of the window of image data for which the histogram may be formed.



FIG. 14 shows embodiments of the present invention in which a digital image 140 may be combined 143 with output 142 of a pixel-selection module 141 to generate data 144 which may be considered in the entropy calculation. The data 144 may be quantized 145. A histogram 148 may be formed 147 from the quantized data 146, and an entropy measure 139 may be calculated 149 for the histogram 148. The pixel-selection module 141 comprises pixel-selection logic that may use multiple masks 137, 138 as input. A mask 137, 138 may correspond to an image structure. Exemplary image structures may include text, halftone, page background, and edges. The pixel-selection logic 141 generates a selection mask 142 that is combined with the digital image 140 to select image pixels that may be masked in the entropy calculation.


In some embodiments of the present invention, the masking condition may be based on the edge strength at a pixel.


In some embodiments of the present invention, a level of confidence in the degree to which the masking condition is satisfied may be calculated. The level of confidence may be used when accumulating a pixel into the histogram. Exemplary embodiments in which a level of confidence is used are shown in FIG. 15.


In exemplary embodiments of the present invention shown in FIG. 15, a masked image 151 may be formed 152 from an input image 150. The masked image 151 may be formed by checking a masking condition at each pixel in the input image 150. An exemplary embodiment shown in FIG. 16, illustrates the formation 152 of the masked image 151. If an input image pixel 160 satisfies 162 the masking condition, the corresponding pixel in the masked image may be assigned 166 a value, mask-pixel value, indicating that the masking condition is satisfied at that pixel. If an input image pixel 160 does not satisfy the masking condition 164, the corresponding pixel in the masked image may be assigned the value of the corresponding pixel in the input image 168. At pixels for which the masking condition is satisfied 162, a further assignment 165 of a confidence value reflecting the confidence in the mask signature signal may be made. The assignment of confidence value may be a separate value for the masked pixels, or the mask-pixel value may be multi-level with the levels representing the confidence. The masked image may mask pixels in the input image for which a masking condition is satisfied, and further identify the level to which the masking condition is satisfied.


In the exemplary embodiments of the present invention shown in FIG. 15, after forming 152 the masked image 151, a histogram 153 may be generated 154 for a block of the masked image 151. FIG. 17 shows an exemplary embodiment of histogram formation 154. A histogram with bins corresponding to the possible pixel values of the masked image may be formed according to FIG. 17. In some embodiments, all bins may be initially considered empty with initial count zero. The value of a pixel 170 in the block of the masked image may be compared 171 to the mask-pixel value. If the value of the pixel 170 is equal 172 to the mask-pixel value, then the pixel is accumulated 173 in the histogram at a fractional count based on the confidence value, and if there are pixels remaining in the block to examine 176, then the next pixel in the block is examined 171. If the value of the pixel 170 is not equal 174 to the mask-pixel value, then the pixel is accumulated in the histogram 175, meaning that the histogram bin corresponding to the value of the pixel is incremented, and if there are pixels remaining in the block to examine 177, then the next pixel in the block is examined 171.


When a pixel is accumulated in the histogram 175, a counter for counting the number of non-mask pixels in the block of the masked image may be incremented 178. When all pixels in a block have been examined 180, 179, the histogram may be normalized 130. The histogram may be normalized 130 by dividing each bin count by the number of non-mask pixels in the block of the masked image. In alternate embodiments, the histogram may not be normalized and the counter not be present.


An entropy measure 155 may be calculated 156 for the histogram of a neighborhood of the masked image as described in the previous embodiments. In the embodiments of the present invention shown in FIG. 15, after calculating 156 the entropy measure 155 for the histogram 153 corresponding to a block of the image centered at a pixel, the pixel may be classified 157 according to the entropy feature 155. The classifier 157 shown in FIG. 15 may be based on thresholding. A threshold may be determined a priori, adaptively, or by any of numerous methods. The pixel may be classified 157 as belonging to one of two regions depending on which side of the threshold the entropy measure 155 falls.


In some embodiments of the present invention, the masking condition may comprise a single image condition. In some embodiments, the masking condition may comprise multiple image conditions combined to form a masking condition.


In some embodiments of the present invention, the entropy feature may be used to separate the image into two regions. In some embodiments of the present invention, the entropy feature may be used to separate the image into more than two regions.


In some embodiments of the present invention, the full dynamic range of the data may not be used. The histogram may be generated considering only pixels with values between a lower and an upper limit of dynamic range.


In some embodiments of the present invention, the statistical entropy measure may be as follows:







E
=

-




i
=
1

N




h


(
i
)


*


log
2



(

h


(
i
)


)






,





where N is the number of bins, h(i) is the normalized






(





i
=
1

N



h


(
i
)



=
1

)





histogram count for bin i, and log2(0)=1 may be defined for empty bins.


The maximum entropy may be obtained for a uniform histogram distribution,








h


(
i
)


=

1
N


,





for every bin. Thus,







E





max

=


-




i
=
1

N




1
N

*


log
2



(

1
N

)





=

-



log
2



(

1
N

)


.







The entropy calculation may be transformed into fixed-point arithmetic to return an unsigned, 8-bit, uint8, measured value, where zero corresponds to no entropy and 255 corresponds to maximum entropy. The fixed-point calculation may use two tables: one table to replace the logarithm calculation, denoted log_table below, and a second table to implement division in the histogram normalization step, denoted rev_table. Integer entropy calculation may be implemented as follows for an exemplary histogram with nine bins:







log_table


[
i
]


=


2
log_shift

*


log
2



(
i
)









s
=




i
=
0

8



hist


[
i
]










rev_table


[
i
]


=



2
rev_shift

*

255
Emax


i







s_log
=

log_table


[
s
]








s_rev
=

rev_table


[
s
]









bv


[
i
]


=


hist


[
i
]


*
s_rev








log_diff


[
i
]


=

s_log
-

log_table


[

hist


[
i
]


]










E
=

(




i
=
0

NBins



(


(


bv


[
i
]


*

log_diff


[
i
]



)

>>


(
log_shift





+




rev_shift




-
accum_shift


)


)


>>
accum_shift





where log_shift, rev_shift, and accum_shift may be related to the precision of the log, division, and accumulation operations, respectively.


An alternate hardware implementation may use an integer divide circuit to calculate n, the normalized histogram bin value.






n
=


(


hist


[
i
]




<<
8


)

/
s








Ebin
=

(

81
*
n
*

log_table


[
n
]



)


>>
16






E
=




i
=
0

NBins




Ebin




[
i
]

.







In the example, the number of bins is nine (N=9), which makes the normalization multiplier 255/Emax=81.


The fixed-point precision of each calculation step may be adjusted depending upon the application and properties of the data being analyzed. Likewise the number of bins may also be adjusted.


In some embodiments of the present invention, pictorial regions may be detected in an image using a staged refinement process that may first analyze the image and its derived image features to determine likely pictorial regions. Verification and refinement stages may follow initial determination of the likely pictorial regions. In some embodiments of the present invention, masked entropy may be used to initially separate pictorial image regions from non-pictorial image regions. Due to the uniform nature of page background and local background regions in a digital image, such regions will have low entropy measures. Pictorial regions may have larger entropy measures due to the varying luminance and chrominance information in pictorial regions compared to the more uniform background regions. Text regions, however, may also have large entropy measures due to the edge structure of text. It may be desirable to mask text pixels when determining entropy measures for identifying pictorial regions in images. Alternatively, masking of all strong edge structures, which may include buildings, signs, and other man-made structures in pictorial regions in addition to text, may reduce identification of text regions as pictorial regions while not significantly reducing the identification of pictorial regions. While pictorial regions typically have greater entropy measures, more uniform pictorial regions such as sky regions, may have low entropy measure, and such regions may be missed in the detection of pictorial regions based on entropy or masked entropy.


Some embodiments of the present invention shown in FIG. 18 may include refinement 184 of the initial pictorial map 183 detected 182 based on masked entropy measures in the digital image 181. In some embodiments, verification 186 may follow the refinement 184.


In some embodiments of the present invention, the initial pictorial map 183 may be generated as shown in FIG. 19. In these embodiments, the initial pictorial map 183 may be generated by a region growing process 192. The region growing process 192 may use pictorial-region seeds 193 that may result from pictorial detection 190 based on masked entropy features of the image 191. The pictorial-region seeds 193 may be those pixels in the digital image for which the masked entropy measure 191 may be considered reliable. Those pixels with high masked entropy may be considered pixels for which the masked entropy feature is most reliable. Such pixels may form the seeds 193 used in the region growing 192 of the embodiments of the present invention shown in FIG. 19. A threshold may be used to determine the pictorial-region seeds 193. In some embodiments of the present invention, domain knowledge may be used to determine the threshold. In some embodiments, the pixels with the highest 10 percent of the masked entropy values in the image may be used as pictorial-region seeds 193.


The region growing 192 from the pictorial-region seeds 193 may be controlled by bounding conditions. Pictorial regions may be grown from the high-confidence pictorial-region seeds into the less reliable pictorial-feature response areas. In some embodiments, the pictorial region may be grown until a pixel with a low-confidence level is encountered. In this way, pictorial regions may be grown to include pixels based on their connectivity to those pixels with a strong pictorial-feature response.


In some embodiments, additional information may be used in the region growing process. In some embodiments the additional information may be related to background region identification. A labeled background map indicating background regions may be used in the region growing. In some embodiments, the labeled background map may include, in addition to indices indicating membership in a background region and indexing a background color palette, two reserved labels. One of the reserved labels may represent candidate pictorial pixels as identified by the background color analysis and detection, and the other reserved label may represent pixels with unreliable background color analysis and labeling. In some embodiments, the map label “1” may indicate that a pixel belongs to a candidate pictorial region. The map labels “2” through “254” may indicate background regions, and the map label “255” may represent an unknown or unreliable region.


In some embodiments, the region growing may proceed into regions of low confidence if those regions were labeled as pictorial candidates by the background color analysis and labeling. The pictorial regions may not grow into regions labeled as background. When the growing process encounters a pixel labeled as unknown or unreliable, the growing process my use a more conservative bounding condition or tighter connectivity constraints to grow into the unknown or unreliable pixel. In some embodiments, a more conservative bounding condition may correspond to a higher confidence level threshold. In some embodiments, if a candidate pixel is labeled as a pictorial candidate by the background color analysis, only one neighboring pixel may be required to belong to a pictorial region for the pictorial region to grow to the candidate pixel. If the candidate pixel is labeled as unknown or unreliable by the background color analysis, at least two neighboring pixels may be required to belong to a pictorial region for the pictorial region to grow to the candidate pixel. The neighboring pixels may be the causal neighbors for a particular scan direction, the four or eight nearest neighbors, or any other defined neighborhood of pixels. In some embodiments of the present invention, the connectivity constraint may be adaptive.


In some embodiments of the present invention, refinement may be performed after initial region growing as described above. FIG. 20 shows an exemplary pictorial region 200 with the results of the region growing 202. Two regions 204, 206 were missed in the initial region growing. Refinement of the initial pictorial map may detect such missed regions. In some embodiments, interior holes in a pictorial region, such as 206 in the exemplary pictorial region shown in FIG. 20, may be detected and labeled as pictorial using any hole-filling method, for example, a flooding algorithm or a connected components algorithm. In some embodiments, concave regions 204 may be filled based on a bounding shape computed for the pictorial region. If a uniform color, or substantially uniform color, surrounds the bounding shape determined for a pictorial region, then concave regions on the boundary of the pictorial region may be labeled as belonging to the pictorial region. A bounding shape may be computed for each region. In some embodiments, the bounding shape may be a rectangle forming a bounding box for the region.


In some embodiments of the present invention, verification of the refined pictorial map may follow. Pictorial map verification may be based on the size of a pictorial region. Small regions identified as pictorial regions may be removed and relabeled. In some embodiments, regions identified as pictorial regions may be eliminated from the pictorial region classification by the verification process based on the shape of the region, the area of the region within a bounding shape, the distribution of the region within a bounding shape, or a document layout criterion. In alternate embodiments, verification may be performed without refinement. In alternate embodiments, hole-filling refinement may be followed by small-region verification which may be subsequently followed by concave-region-filling refinement.


The terms and expressions which have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding equivalence of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.

Claims
  • 1. A method for detecting a pictorial region in a digital image, said method comprising: forming a masked image of a digital image, wherein said masked image comprises a first plurality of pixel locations, whereat each pixel in said first plurality of pixel locations in said masked image is assigned a mask-pixel value, wherein said first plurality of pixel locations in said masked image corresponds to a first plurality of pixel locations in said digital image whereat a masking condition based on edge strength is satisfied; and said masked image further comprises a second plurality of pixel locations, wherein said second plurality of pixel locations in said masked image corresponds to a second plurality of pixel locations in said digital image whereat said masking condition is not satisfied, and wherein each pixel in said second plurality of pixel locations in said masked image is assigned a value of said corresponding pixel in said digital image;in a region-detection system comprising a calculator, calculating a masked entropy value for each pixel in a third plurality of pixels in said masked image, wherein said masked entropy value for said each pixel is based on a frequency-of-occurrence, in said masked image, of pixel values not equal to said mask-pixel value, in a region proximate to said each pixel, and a function of said frequency-of-occurrence, wherein said function comprises a mathematical characteristic similar to a logarithmic function;determining, in said region-detection system, a confidence level for said each pixel in said third plurality of pixels in said masked image based on said masking condition;determining, in said region-detection system, a seed region comprising a seed-region pixel, wherein said determining a seed region comprises selecting said seed-region pixel, from said third plurality of pixels in said masked image, when said masked entropy value calculated for said seed-region pixel is a reliable value, wherein said reliable value is a relatively high masked entropy value; andin said region-detection system, growing said seed region based on said confidence levels thereby producing a pictorial region wherein said pictorial region comprises pictorial-region pixels from said third plurality of pixels in said masked image.
  • 2. A method as described in claim 1 further comprising refining said pictorial region thereby producing a refined pictorial region.
  • 3. A method as described in claim 2 wherein said refining comprises filling at least one of holes and concave boundary regions in said pictorial region.
  • 4. A method as described in claim 1 further comprising verifying said pictorial region thereby producing a verified pictorial region.
  • 5. A method as described in claim 4 wherein said verifying said pictorial region comprises determining at least one of the size of said pictorial region, the shape of said pictorial region, the area of said pictorial region within a first bounding shape, and the distribution of said pictorial region with a second bounding shape.
  • 6. A method as described in claim 1 wherein said determining said seed region further comprises: determining a first confidence-level threshold value; andidentifying a target pixel from said third plurality of pixels in said masked image as one of said seed-region pixels if said confidence level for said target pixel is greater than said first confidence-level threshold value.
  • 7. A method as described in claim 1 wherein said growing further comprises: determining a second confidence-level threshold value; andidentifying a candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is connected to said seed region and said confidence level for said candidate pixel is greater than said second confidence-level threshold value.
  • 8. A method as described in claim 6 wherein said first confidence-level threshold is based on the range of said confidence levels.
  • 9. A method as described in claim 7 further comprising: obtaining a labeled background map for said digital image; andidentifying said candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is less than said second confidence-level threshold value and said candidate pixel is labeled as pictorial content in said labeled background map.
  • 10. A system for detecting a pictorial region in a digital image, said system comprising: a mask generator for forming a masked image of a digital image, wherein said masked image comprises a first plurality of pixel locations, whereat each pixel in said first plurality of pixel locations in said masked image is assigned a mask-pixel value, wherein said first plurality of pixel locations in said masked image corresponds to a first plurality of pixel locations in said digital image whereat a masking condition based on edge strength is satisfied; and said masked image further comprises a second plurality of pixel locations, wherein said second plurality of pixel locations in said masked image corresponds to a second plurality of pixel locations in said digital image whereat said masking condition is not satisfied, and wherein each pixel in said second plurality of pixel locations in said masked image is assigned a value of said corresponding pixel in said digital image;a calculator processor for calculating a masked entropy value for each pixel in a third plurality of pixels in said masked image, wherein said masked entropy value for said each pixel is based on a frequency-of-occurrence, in said masked image, of pixel values not equal to said mask-pixel value, in a region proximate to said each pixel, and a function of said frequency-of-occurrence, wherein said function comprises a mathematical characteristic similar to a logarithmic function;a first determiner for determining a confidence level for said each pixel in said third plurality of pixels in said masked image based on said masking condition;a second determiner for determining seed regions based on said masked entropy values, wherein said second determiner determines a first pixel is a seed-region pixel when a first masked entropy value calculated for said first pixel is considered a reliable value based on a relative greatness of said first masked entropy value in relation to a plurality of other masked entropy values calculated; anda region grower for growing said regions based on said confidence levels.
  • 11. A system as described in claim 10 further comprising a refiner for refining said pictorial region thereby producing a refined pictorial region.
  • 12. A system as described in claim 11 wherein said refining comprises filling at least one of holes and concave boundary regions in said pictorial region.
  • 13. A system as described in claim 10 further comprising a verifier for verifying said pictorial region thereby producing a verified pictorial region.
  • 14. A system as described in claim 13 wherein said verifier comprises at least one of a size determiner for determining the size of said pictorial region, a shape determiner for determining the shape of said pictorial region, an area determiner for determining the area of said pictorial region within a first bounding shape, and a distribution determiner for determining the distribution of said pictorial region within a second bounding shape.
  • 15. A system as described in claim 10 wherein said second determiner for determining said seed region further comprises: a first-confidence-level-threshold determiner for determining a first confidence-level threshold value; anda seed-region-pixel identifier for identifying a target pixel from said third plurality of pixels in said masked image as one of said seed-region pixels if said confidence level for said target pixel is greater than said first confidence-level threshold value.
  • 16. A system as described in claim 10 wherein said growing further comprises: a second-confidence-level-threshold determiner for determining a second confidence-level threshold value; anda first pictorial-region-pixel identifier for identifying a candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is connected to said seed region and said confidence level for said candidate pixel is greater than said second confidence-level threshold value.
  • 17. A system as described in claim 15 wherein said first confidence-level threshold is based on the range of said confidence levels.
  • 18. A system as described in claim 16 further comprising: an obtainer for obtaining a labeled background map for said digital image; anda second pictorial-region-pixel identifier for identifying said candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is less than said second confidence-level threshold value and said candidate pixel is labeled as pictorial content in said labeled background map.
  • 19. A method for detecting a pictorial region in a digital image, said method comprising: forming a masked image of a digital image, wherein said masked image comprises a first plurality of pixel locations, whereat each pixel in said first plurality of pixel locations in said masked image is assigned a mask-pixel value, wherein said first plurality of pixel locations in said masked image corresponds to a first plurality of pixel locations in said digital image whereat a masking condition related to edge strength is satisfied; and said masked image further comprises a second plurality of pixel locations, wherein said second plurality of pixel locations in said masked image corresponds to a second plurality of pixel locations in said digital image whereat said masking condition is not satisfied, and wherein each pixel in said second plurality of pixel locations in said masked image is assigned a value of said corresponding pixel in said digital image;in a region-detection system comprising a calculator, calculating a masked entropy value for each pixel in a third plurality of pixels in said masked image, wherein said masked entropy value for said each pixel is based on a frequency-of-occurrence, in said masked image, of pixel values not equal to said mask-pixel value, in a region proximate to said each pixel, and a function of said frequency-of-occurrence, wherein said function comprises a mathematical characteristic similar to a logarithmic function;in said region-detection system, determining a confidence level for said each pixel in said third plurality of pixels in said masked image based on said masking condition associated with said mask;in said region-detection system, determining a seed region comprising a seed-region pixel, wherein said determining a seed region comprises selecting said seed-region pixel, from said third plurality of pixels in said masked image, when said masked entropy value calculated for said seed-region pixel is a reliable value, wherein a said reliable value is a relatively high masked entropy value;in said region-detection system, obtaining a labeled background map for said digital image wherein said labeled background map comprises a label corresponding to pictorial content;in said region-detection system, growing said seed region based on said confidence levels and said labeled background map thereby producing a pictorial region wherein said pictorial region comprises pictorial-region pixels from said third plurality of pixels in said masked image;in said region-detection system, refining said pictorial region thereby producing a refined pictorial region; andin said region-detection system, verifying said refined pictorial region thereby producing a verified pictorial region.
  • 20. The method as described in claim 19 further comprising: determining a first confidence-level threshold value;identifying a target pixel from said third plurality of pixels in said masked image as one of said seed-region pixels if said confidence level for said target pixel is greater than said first confidence-level threshold value;determining a second confidence-level threshold value;identifying a candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is connected to said seed region and said confidence level for said candidate pixel is greater than said second confidence-level threshold value; andidentifying said candidate pixel from said third plurality of pixels in said masked image as a pictorial-region pixel if said candidate pixel is less than said second confidence-level threshold value and said candidate pixel is labeled as pictorial content in said labeled background map.
RELATED REFERENCES

This application is a continuation-in-part of U.S. patent application Ser. No. 11/367,244, entitled “Methods and Systems for Detecting Regions in Digital Images,” filed on Mar. 2, 2006.

US Referenced Citations (144)
Number Name Date Kind
4414635 Gast et al. Nov 1983 A
4741046 Matsunawa et al. Apr 1988 A
5001767 Yoneda et al. Mar 1991 A
5034988 Fujiwara Jul 1991 A
5157740 Klein et al. Oct 1992 A
5265173 Griffin et al. Nov 1993 A
5280367 Zuniga Jan 1994 A
5293430 Shiau et al. Mar 1994 A
5339172 Robinson Aug 1994 A
5353132 Katsuma Oct 1994 A
5379130 Wang et al. Jan 1995 A
5481622 Gerhardt et al. Jan 1996 A
5546474 Zuniga Aug 1996 A
5581667 Bloomberg Dec 1996 A
5588072 Wang Dec 1996 A
5642137 Kitazumi Jun 1997 A
5649025 Revankar Jul 1997 A
5682249 Harrington et al. Oct 1997 A
5689575 Sako et al. Nov 1997 A
5694228 Peairs et al. Dec 1997 A
5696842 Shirasawa et al. Dec 1997 A
5767978 Revankar et al. Jun 1998 A
5768403 Suzuki et al. Jun 1998 A
5778092 MacLeod et al. Jul 1998 A
5809167 Al-Hussein Sep 1998 A
5848185 Koga et al. Dec 1998 A
5854853 Wang Dec 1998 A
5867277 Melen et al. Feb 1999 A
5900953 Bottou et al. May 1999 A
5903363 Yaguchi et al. May 1999 A
5917945 Cymbalski Jun 1999 A
5923775 Snyder et al. Jul 1999 A
5943443 Itonori et al. Aug 1999 A
5946420 Noh Aug 1999 A
5949555 Sakai et al. Sep 1999 A
5956468 Ancin Sep 1999 A
5960104 Conners et al. Sep 1999 A
5987171 Wang Nov 1999 A
5995665 Maeda Nov 1999 A
6020979 Zeck et al. Feb 2000 A
6084984 Ishikawa Jul 2000 A
6175427 Lehmbeck et al. Jan 2001 B1
6175650 Sindhu et al. Jan 2001 B1
6178260 Li et al. Jan 2001 B1
6198797 Majima et al. Mar 2001 B1
6215904 Lavallee Apr 2001 B1
6222932 Rao et al. Apr 2001 B1
6233353 Danisewicz May 2001 B1
6246791 Kurzweil et al. Jun 2001 B1
6252994 Nafarieh Jun 2001 B1
6256413 Hirabayashi Jul 2001 B1
6272240 Li et al. Aug 2001 B1
6298173 Lopresti Oct 2001 B1
6301381 Hayashi Oct 2001 B1
6308179 Petersen et al. Oct 2001 B1
6347153 Triplett et al. Feb 2002 B1
6360007 Robinson et al. Mar 2002 B1
6360009 Li et al. Mar 2002 B2
6373981 de Queiroz et al. Apr 2002 B1
6389164 Li et al. May 2002 B2
6400844 Fan et al. Jun 2002 B1
6473522 Lienhart et al. Oct 2002 B1
6522791 Nagarajan Feb 2003 B2
6526181 Smith et al. Feb 2003 B1
6535633 Schweid et al. Mar 2003 B1
6577762 Seeger et al. Jun 2003 B1
6594401 Metcalfe et al. Jul 2003 B1
6661907 Ho et al. Dec 2003 B2
6668080 Torr et al. Dec 2003 B1
6718059 Uchida Apr 2004 B1
6728391 Wu et al. Apr 2004 B1
6728399 Doll Apr 2004 B1
6731789 Tojo May 2004 B1
6731800 Barthel et al. May 2004 B1
6766053 Fan et al. Jul 2004 B2
6778291 Clouthier Aug 2004 B1
6782129 Li et al. Aug 2004 B1
6901164 Sheffer May 2005 B2
6950114 Honda et al. Sep 2005 B2
6993185 Guo et al. Jan 2006 B2
7020332 Nenonen et al. Mar 2006 B2
7027647 Mukherjee et al. Apr 2006 B2
7062099 Li et al. Jun 2006 B2
7079687 Guleryus Jul 2006 B2
7133565 Toda et al. Nov 2006 B2
7181059 Duvdevani et al. Feb 2007 B2
7190409 Yamazaki et al. Mar 2007 B2
7206443 Duvdevani et al. Apr 2007 B1
7221805 Bachelder May 2007 B1
7375749 Hattori May 2008 B2
7483484 Liu et al. Jan 2009 B2
7518755 Gotoh et al. Apr 2009 B2
7538907 Nagasaka May 2009 B2
7746392 Hayaishi Jun 2010 B2
20010016077 Oki Aug 2001 A1
20010050785 Yamazaki Dec 2001 A1
20020027617 Jeffers et al. Mar 2002 A1
20020031268 Prabhakar et al. Mar 2002 A1
20020037100 Toda et al. Mar 2002 A1
20020064307 Koga et al. May 2002 A1
20020076103 Lin et al. Jun 2002 A1
20020106133 Edgar et al. Aug 2002 A1
20020110283 Fan et al. Aug 2002 A1
20020168105 Li Nov 2002 A1
20030086127 Ito et al. May 2003 A1
20030107753 Sakamoto Jun 2003 A1
20030133612 Fan Jul 2003 A1
20030133617 Mukherjee Jul 2003 A1
20030156760 Navon et al. Aug 2003 A1
20030228064 Gindele et al. Dec 2003 A1
20040001624 Curry et al. Jan 2004 A1
20040001634 Mehrotra Jan 2004 A1
20040042659 Guo et al. Mar 2004 A1
20040083916 Isshiki May 2004 A1
20040096102 Handley May 2004 A1
20040119856 Nishio et al. Jun 2004 A1
20040179742 Li Sep 2004 A1
20040190027 Foster et al. Sep 2004 A1
20040190028 Foster et al. Sep 2004 A1
20040205568 Breuel et al. Oct 2004 A1
20040240733 Hobson et al. Dec 2004 A1
20050008221 Hull et al. Jan 2005 A1
20050100219 Berkner et al. May 2005 A1
20050100220 Keaton et al. May 2005 A1
20050129310 Herley Jun 2005 A1
20050163374 Ferman et al. Jul 2005 A1
20050174586 Yoshida et al. Aug 2005 A1
20050180647 Curry et al. Aug 2005 A1
20050219390 Tajima et al. Oct 2005 A1
20050248671 Schweng Nov 2005 A1
20050276510 Bosco et al. Dec 2005 A1
20050281474 Huang Dec 2005 A1
20050286758 Zitnick et al. Dec 2005 A1
20060072830 Nagarajan et al. Apr 2006 A1
20060133690 Bloomberg et al. Jun 2006 A1
20060153441 Li Jul 2006 A1
20060221090 Takeshima et al. Oct 2006 A1
20060229833 Pisupati et al. Oct 2006 A1
20060269159 Kim et al. Nov 2006 A1
20070291120 Campbell et al. Dec 2007 A1
20080123945 Andrew et al. May 2008 A1
20080212864 Bornefalk Sep 2008 A1
20080301767 Picard et al. Dec 2008 A1
20080310721 Yang et al. Dec 2008 A1
Foreign Referenced Citations (18)
Number Date Country
06-152945 May 1994 JP
07-107275 Apr 1995 JP
08-065514 Mar 1996 JP
09-186861 Jul 1997 JP
09-204525 Aug 1997 JP
09-251533 Sep 1997 JP
11-213090 Jun 1999 JP
2002-325182 Nov 2002 JP
2003-008909 Jan 2003 JP
2003-123072 Apr 2003 JP
2003-303346 Oct 2003 JP
2004-110606 Apr 2004 JP
2005-159576 Jun 2005 JP
2005-210650 Aug 2005 JP
2005-353101 Dec 2005 JP
2007-235953 Sep 2007 JP
2005067586 Jul 2005 WO
2006066325 Jun 2006 WO
Non-Patent Literature Citations (32)
Entry
Jean Duong, Hubert Emptoz and Ching Y. Suen, Extraction of Text Areas in Printed Document Images, ACM Symposium on Document Engineering, Nov. 9-10, 2001, pp. 157-164, Atlanta, GA, USA.
Feng et al., “Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition,” Dial'06, Apr. 2006, pp. 1-8, IEEE.
Richard Berry and Jim Burnell, “The histogram is a graph of pixel value versus the number of pixels having that value,” 2000, pp. 1-3, from: www.willbell.com/AIP4Win—Updater/Histogram%20Tool.pdf.
Rainer Lienhart and Axel Wernicke, “Localizing and Segmenting Text in Images and Videos,” IEEE Transactions on Circuits and Systems for Video Technology, Apr. 2002, pp. 256-268, vol. 12, No. 4, IEEE, USA.
U.S. Appl. No. 11/424,281—Office action dated Jun. 9, 2009.
U.S. Appl. No. 11/424,297—Office action dated Apr. 28, 2009.
U.S. Appl. No. 11/424,290—Office action dated Nov. 27, 2007.
U.S. Appl. No. 11/424,290—Office action dated May 28, 2008.
U.S. Appl. No. 11/424,290—Office action dated Oct. 27, 2008.
U.S. Appl. No. 11/424,290—Supplemental Office action dated Feb. 10, 2009.
U.S. Appl. No. 11/367,244—Office Action dated Mar. 30, 2009.
Japanese Patent Application No. 2007-229562—Office action—Mailing date Mar. 3, 2009.
U.S. Appl. No. 11/367,244—Office action dated Nov. 3, 2009.
U.S. Appl. No. 11/424,281—Office action dated Nov. 13, 2009.
U.S. Appl. No. 11/424,297—Office action dated Oct. 22, 2009.
Japanese Patent Application No. 2007-035511—Office action—Mailing date Jul. 21, 2009.
Japanese Patent Application No. 2007-035511—Office action—Mailing date Dec. 15, 2009.
U.S. Appl. No. 11/424,290—Office action dated Jul. 17, 2009.
Japanese Office Action—Patent Application No. 2007-159363—Mailing Date Jan. 25, 2011.
U.S. Appl. No. 11/424,281—Notice of Allowance dated May 3, 2010.
U.S. Appl. No. 11/470,519—Notice of Allowance dated Sep. 20, 2010.
U.S. Appl. No. 11/367,244—Notice of Allowance dated Oct. 7, 2010.
U.S. Appl. No. 11/424,297—Office Action dated May 5, 2010.
U.S. Appl. No. 11/424,290—Office Action dated Dec. 21, 2010.
U.S. Appl. No. 12/982,718—Office Action dated Mar. 31, 2011.
U.S. Appl. No. 11/470,519—Office Action dated May 27, 2010.
U.S. Appl. No. 11/367,244—Office Action dated Apr. 30, 2010.
Japanese Office Action—Patent Application No. 2007-159364—Mailing Date Jan. 25, 2011.
USPTO Office Action—U.S. Appl. No. 11/424,290—Mailing Date Sep. 1, 2011.
USPTO Office Action—U.S. Appl. No. 12/982,718—Mailing Date Nov. 28, 2011.
USPTO Notice of Allowance—U.S. Appl. No. 13/007,951—Mailing Date Nov. 28, 2011.
Office Action—U.S. Appl. No. 11/424,290—Notification Date Jun. 11, 2012.
Related Publications (1)
Number Date Country
20070206857 A1 Sep 2007 US
Continuation in Parts (1)
Number Date Country
Parent 11367244 Mar 2006 US
Child 11424296 US