The present invention is, in general, related to systems and methods for processing a digitally captured image and, more particularly, for automatically detecting a perspective distorted document in a digitally captured image.
The analog information produced by the photosensitive capacitive elements is converted to digital information by analog-to-digital (A/D) conversion unit 104. A/D conversion unit 104 may convert the analog information received from CCD 103 in either a serial or parallel manner. The converted digital information may be stored in memory 105 (e.g., random access memory). The digital information is then processed by processor 106 according to control software stored in ROM 107 (e.g., PROM, EPROM, EEPROM, and/or the like). For example, the digital information may be compressed according to the Joint Photographic Experts Group (JPEG) standard. Additionally or alternatively, other circuitry (not shown) may be utilized to process the captured image such as an application specific integrated circuit (ASIC). User interface 108 (e.g., a touch screen, keys, and/or the like) may be utilized to edit the captured and processed image. The image may then be provided to output port 109. For example, the user may cause the image to be downloaded to a personal computer (not shown) via output port 109.
The quality of the captured image is dependent on the perspective or positioning of digital camera 100 with respect to document 101. Specifically, if digital camera 100 is off-angle, the captured image of document 101 may be skewed as shown in captured image 150 of
Accordingly, the image data may be uploaded to a personal computer for processing by various known correction algorithms. The algorithms are employed to correct the distortion effects associated with off-angle images of documents. Typical known correction algorithms require a user to manually identify the corners of a region of a captured image. By measuring the spatial displacement of the identified corners from desired positions associated with a rectangular arrangement, an estimation of the amount of distortion is calculated. The correction algorithm then processes the imaged document to possess the desired perspective and size as necessary and may produce perspective enhanced image 200 of
An automatic corner detection algorithm is described by G. F. McLean in Geometric Correction of Digitized Art, GRAPHICAL MODELS AND IMAGE PROCESSING, Vol. 58, No. 2, March, pp. 142-154 (1996). McLean's algorithm is intended to correct the perspective distortion associated with “archival images of two-dimensional art objects.” Accordingly, the algorithm assumes that some degree of care was taken during the imaging or photography. Thus, the algorithm assumes that the resulting distorted quadrilaterals of the art form “a set of roughly 90° corners.” Upon this assumption, the corners may be estimated by analyzing the intersection of lines that form approximately 90° interior angles. Although the McLean algorithm is clearly advantageous as compared to pure manual selection of corners, the assumptions of this algorithm are not always appropriate for digital images (e.g., those taken by casual users) which may exhibit appreciable perspective distortion.
In one embodiment, the present invention is directed to a method for processing a digitally captured image that comprises an imaged document. The method comprises: detecting graphical information related to spatial discontinuities of the digitally captured image; detecting lines from the detected graphical information; computing effective area parameters for quadrilaterals associated with ones of the detected lines, wherein each effective area parameter for a respective quadrilateral equals an area of the respective quadrilateral modified by at least a corner matching score that is indicative of a number of connected edge pixels in corners of the respective quadrilateral; and selecting a quadrilateral of the quadrilaterals that possesses a largest effective area parameter.
Embodiments of the present invention are operable to process a digitally captured image that comprises a perspective distorted document. Embodiments of the present invention are operable to automatically detect the imaged document from the digitally captured image without requiring interaction from a user. After detection, embodiments of the present invention are operable to correct the perspective distortion of the image document and may performing scaling as desired. For example, embodiments of the present invention may process digitally captured image 150 of
In step 301 of flowchart 300, image data of a captured document is received. The image data may be encoded utilizing any suitable graphical encoding format including but not limited to Tag Image File Format (TIFF), Joint Photographic Experts Group (JPEG) format, Graphics Interchange Format (GIF), Portable Network Graphics (PNG), bit-mapped (BMP) format, and/or the like.
In step 302, if the image data is in a color format, the color image is transformed into a luminance image (e.g., gray-scale values). For example, if the original image data is in RGB format (where R, G, and B respectively represent the intensities of the red, green, and blue chromatic components), the image data may be transformed as follows:
Y=int (0.299*R+0.587**G+0.114*B+0.5),
where Y is the luminance value according to the YCrCb encoding scheme. It shall be appreciated that the present invention is not limited to any particular color coordinate system and other transforms may be utilized according to embodiments of the present invention.
In step 303, a working copy of the image may be created by down-sampling the image. The down-sampling reduces the complexity and, hence, processing time associated with the process flow. Down-sampling may occur utilizing any number of techniques. For example, the original luminance image may possess X×Y pixels which are divided into groups or blocks of pixels that are N×N pixels wide. In embodiments of the present invention, N is selected to equal seven (7), although other values may be utilized. The luminance of each block may be averaged. The average value for each block may be used as or mapped to the luminance for a single down-sampled pixel of the work copy. In this case, the work copy possesses (X/N)×(Y/N) down-sampled pixels possessing average values related to the respective blocks of the luminance map.
In step 304, the working copy is smoothed utilizing a low-pass filter. The following Gaussian filter may be used to smooth the working copy:
fi,j=ke−a
where k is a normalizing factor such that
and c is the center of the filter. In embodiments of the present invention the size of the filter is selected to equal 5×5 pixels and α is selected to equal 1.7, although other values may be utilized.
In step 305, edge detection is performed upon the down-sampled and smoothed pixels. Edge detection generally refers to the processing of a graphical image to detecting or identifying spatial discontinuities of an image. Edge detection may include various steps to enhance graphical data that is related to such discontinuities and to suppress graphical data that is not related to such discontinuities before performing the detection process. Edge detection may occur by utilizing the Canny edge detection algorithm which is described in “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, November 1986, which is incorporated herein by reference. Other edge detection algorithms may also be utilized. Edge pixels or edge points may refer the pixels that retain non-zero values after the edge detection processing occurs.
In step 306, noisy edges are removed. For each edge pixel detected by the edge detection algorithm, neighboring edge pixels are counted within a square window (e.g., five by five (5×5)). For example, an edge pixel located at (i, j) would have neighboring edges pixel within the window having corners (i−w,j−w), (i+w, j−w), (i+w, j−w), and (i+w, j+w) where w is the size of the window. If the number of edge pixels within the square window is less than CL or larger than CH, the respective pixel is removed. In embodiments of the present invention, CL is selected to equal three (3) and CH is selected to equal nine (9) for a five by five (5×5) window, although other values may be utilized. Additionally, after removing edge pixels via the window analysis, additional edge pixels may be removed by tracing edge 8-neighbor connectivity. Specifically, if an edge trace is less than a threshold (i.e, there are less number of pixels in the edge trace), all edge pixels in the trace are removed.
In step 307, line detection and selection are performed. For example, the Hough transform (see, for example, Digital Image Processing, Rafael C. Gonzalez and Paul Wintz, (1987 Second Edition), which is incorporated herein by reference) may be utilized to detect edges that approximate straight lines. Each line is assigned a metric that is referred to as the Effective Length (EL). Utilizing the EL metric, a predetermined number of detected lines are selected as candidates for page edges. In embodiments of the present invention, thirty-five (35) lines are selected, although other numbers of lines may be selected.
In step 308, the parameters of all possible quadrilaterals (for sided polygons) are computed and, based on the computed parameters, a candidate group of quadrilaterals is selected.
In step 309, the effective area of each quadrilateral of the candidate group is selected and the quadrilateral associated with the largest effective area metric is selected. The largest effective area metric utilizes the area of the respective quadrilaterals modified by a corner matching score and an edge matching score. In embodiments of the present invention, the corner matching score is calculated utilizing, in part, the number of connected edge pixels associated with corners of a respective quadrilateral. In embodiments of the present invention, the edge matching score is calculated utilizing, in part, the number of connected edge pixels of sides of the respective quadrilateral. By utilizing an effective area parameter calculated in this manner, embodiments of the present invention enable perspective correction of relatively highly skewed imaged documents (e.g., those taken by casual users). Thus, embodiments of the present invention are not limited to correcting perspective of quadrilaterals that form a set of roughly 90° corners. Moreover, an effective area parameter calculated in this manner enables performance of perspective correction to be relatively robust to noise in the digital image.
In step 310, the corner locations are refined using a minimum mean square error (MMSE) line-fitting procedure. Specifically, lines are selected from the detected edges that minimize the mean square error of the line to the detected edges. The lines that satisfy the MMSE condition are used to defined the corner locations.
In step 311, based upon the refined corner locations, perspective correction is performed. Suitable perspective correction algorithms may utilize a polygon mapping technique. Specifically, each pixel in the imaged document may be mapped to a polygon in the enhanced image, where the shape of the polygon is dependent upon the position of the pixel and the positions of the final corner locations.
Because the processing of flowchart 300 has been discussed at a relatively high-level for the convenience of the reader, it is now appropriate to discuss processing of certain steps of flowchart 300 in greater detail. As previously discussed with respect to step 305 of flowchart 300, edge detection may be performed by processing of the work copy of the luminance image to enhance spatial discontinuities in the work copy and to suppress graphical data that is not related to such discontinuities. According to embodiments of the present invention, to perform the processing, the following gradient calculations are performed:
where Gi,jI is the orthogonal gradient component in the vertical direction at point (i,j), Gi,jJ is the orthogonal gradient component in the horizontal direction at point (i,j), Mi,j is the magnitude of the gradient at point (i,j), θ is the angle of the gradient (i,j), and yij is the luminance of the work copy at point i,j.
After performing the gradient calculations, the magnitude of the gradient at two point in the gradient direction may be linearly interpolated as described by the following pseudo-code:
From the linear interpolation, the existence of an edge is preferably detected (see step 305 of flowchart 300) by the following logic (where Te is a threshold value and equals five (5) for embodiments of the present invention):
In the preceding pseudo-code, it shall be appreciated that M1 and M2 are the respective gradient magnitudes at two points and Mij and θij are the linearly interpolated gradient parameters graphically depicted in
As previously discussed, line detection (step 307) preferably occurs after edge detection (305) and removal of noisy edges (306). To implement step 307 of flowchart 300, a Hough transform may be utilized to perform line detection from the detected edges and to assign a metric call Effective Length. The Hough transform uses the mathematical representation of ρ=i sin+j cos θ as illustrated in mathematical representation 501 of
Additionally, it shall be appreciated that this representation is consistent with the edge representation that uses the line's normal angle for the line's orientation. For embodiments of the present invention, the parameters (ρ, θ) are quantized into discrete levels. The quantized parameter space may be represented by a two-dimensional integer array Hρ,θ (the accumulator array) as depicted by quantized parameter space 601 of
To perform line detection (step 307 of flowchart 300) according to embodiments of the present invention, all entries of the array Hρ, θ are initialized to equal zero. For every edge pixel at location (i,j) that is marked by Mi,j>0, the following pseudo-code is performed to analyze the respective edge pixels:
In the preceding pseudo-code, TL is a parameter that sets the statistical criteria of the permissible deviation between an edge and line orientations. In embodiments of the present invention, TL is selected to equal 0.95, although other values may be utilized.
After building the accumulator array, a minimum line length (Lmin) may be advantageously applied. In embodiments of the present invention, Lmin is selected to equal one twentieth ( 1/20) of the maximum side length of the edge map, although other values may be utilized. All entries of the array H92 , θ that possess a count less than Lmin are set to zero.
Each non-zero entry of the array Hρ, θ represents a potential line for further analysis. The Effective Length (EL) of each candidate is calculated using the edge map. The Effective Length calculation begins by representing each potential line as the array x[ ] via the operations depicted in pseudo-code 701 of
The Effective Length (EL) of a candidate line may be computed from the array x[ ] as follows:
which equals the sum of the projected pixel length of the longest connected consecutive edge segment CCS within the array
which equals the sum of distances corresponding to the longest consecutive edge segment. After determining the Effective Length (EL) for a potential line, the respective position in the array Hρ,θ is replaced with the value EL.
After calculating the Effective Length (EL) for the respective candidates in the array Hρ, η, a predetermined number of lines are selected from the array as candidate lines (step 307 of flowchart 300). The predetermined number of candidate lines may be selected according to pseudo-code 801 of
j=(ρ1 cos θ0−ρ0 cos θ1)/sin(θ1−θ0)
i=(ρ0 sin θ1−ρ1 sin θ0)/sin(θ1−θ0)
As previously noted, the possible quadrilaterals defined by the candidate lines of step 307 are analyzed in step 308 of
CN4=(N·(N−1)·(N−2)·(N−3))/24
The following pseudo-code demonstrates an iterative computational process flow that may be utilized to evaluate each possible quadrilateral using the line representation of (ρN, θN, ELN):
Endfor
In the preceding pseudo-code, minArea is the minimum area required for a page quadrilateral (e.g., a quadrilateral that could correspond to an imaged document) which is selected to equal one sixteenth ( 1/16) of the area of the edge map according to embodiments of the invention.
Reference is now made to line graph 901 of
As previously discussed, from all of the possible quadrilaterals, M candidate quadrilaterals are selected for further analysis according to relative areas. In embodiments of the present invention, M is selected to equal thirty-five (35), although other values may be utilized. For each of the M candidate, an adjusted area computation is performed (see step 309 of
EA=(1+cs·Wc+es·We)·area,
where Cs is a corner-matching score normalized to the range of [0.0,1.0], es is an edge-matching score that is also normalized, Wc is a weight for corner-matching, and We is a weight of edge matching.
For illustrative purposes, reference is now made to quadrilateral 1101 of
C=max(30,max(width, height)/12),
where width and height are the width and height of the respective edge map.
For each edge n of a candidate quadrilateral specified by two corner points (i0,j0) and (i1,j1) of (line ρ, θ), a working array x[0:L] is utilized according to the processing of pseudo-code 1201 of
where width and height are the dimensions of the edge map. Also, the parameter D of pseudo-code 1201 limits search processing while TL1 and TL2 are threshold values. In embodiments of the present invention, D is selected to equal five (5), TL1 is selected to equal 0.9, and TL2 is selected to equal 0.98, although other values may be utilized.
From the array created by pseudo-code 1201, two values may be computed for each set of ends (i0,j0) and (i1,j1). The first value is the corner pixel ratio (cprn,m) which equals: (the number of x(i).d≧0 (connected edge pixels) entries within the corner segment) divided by the corner size C for the respective C as defined above. The second value is the corner matching score (csn,m) which equals: (the number of x(i).d≧0 within the corner plus the longest consecutive segment within the corner minus the square root of the sum of distances corresponding to the longest consecutive segment)/(2C). Additionally, it shall be observed that the index n may take the values of 0, 1, 2, and 3 for four edges and the index m may take the values of 0 and 1 for the two ends of the each edge.
Moreover, two values may be calculated for the respective edge (of length ((i0−i1)2+(J0+j1)2)1/2). The first value is the edge pixel ratio (eprn) which equals: (the number of x(i).d≧0 (connected edge pixels) entries within the edge)/L. The second value is a normalized edge-matching score (esn) which equals: (the longest connected consecutive segment within the edge—the square root of the sum of distances corresponding to the longest consecutive segment—the longest broken segment)/L. Additionally, it is noted that the index n may take the values of 0, 1, 2, 3, for four edges of the respective quadrilateral.
After these metrics are computed, the respective quadrilateral is analyzed for its qualifications against additional criteria. First, a corner of the respective quadrilateral exists if both of its corner pixel ratio cprn,0 and cprn,1 are above a threshold (0.7 according to embodiments of the present invention). Secondly, an edge exists for the respective quadrilateral if its edge pixel ratio epr is above a threshold (0.65 according to embodiments of the present invention).
A respective quadrilateral is selected for the candidate group if it has at least two corners and three edges deemed to exist by the preceding criteria. For each quadrilateral, a normalized corner-matching score and an edge-matching score are computed, for use in the Effective Area (EA) metric that was previously discussed, as follows:
To implement step 310 of
As previously noted, after the corner locations are determined, the imaged document is processed by one of a number of suitable perspective correction algorithms that are known in the art. The selected perspective correction algorithm preferably processes the imaged document to cause the imaged document to be rectangular in shape and to occupy substantially all of the viewable area of the final image area. Suitable perspective correction algorithms may utilize a polygon mapping technique. Specifically, each pixel in the imaged document may be mapped to a polygon in the enhanced image, where the shape of the polygon is dependent upon the position of the pixel and the positions of the final corner locations.
When implemented via executable instructions, various elements of the present invention are in essence the code defining the operations of such various elements. The executable instructions or code may be obtained from a readable medium (e.g., hard drive media, optical media, EPROM, EEPROM, tape media, cartridge media, and/or the like) or communicated via a data signal from a communication medium (e.g., the Internet). Embodiments of the present invention may be implemented utilizing other programmable logic such as logic gate implementations, integrated circuit designs, and/or the like. In fact, readable media can include any medium that can store or transfer information.
By applying a suitable perspective correction algorithm and automatic document detection according to embodiments of the present invention, captured image 150 of
Number | Name | Date | Kind |
---|---|---|---|
5343254 | Wada et al. | Aug 1994 | A |
5808623 | Hamburg | Sep 1998 | A |
6801653 | Wu et al. | Oct 2004 | B1 |
6804387 | Sakaue et al. | Oct 2004 | B1 |
6806903 | Okisu et al. | Oct 2004 | B1 |
7116823 | Clark et al. | Oct 2006 | B2 |
20020061131 | Sawhney et al. | May 2002 | A1 |
20030026482 | Dance | Feb 2003 | A1 |
20030043303 | Karuta et al. | Mar 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20040012679 A1 | Jan 2004 | US |