Embodiments of the present invention relate generally to methods, systems and apparatus for correcting artifacts in a camera-captured document image and, in particular, to methods, systems and apparatus for correcting perspective distortion in the camera-captured document image.
Estimation of the planar pose of a camera-captured document image may be required to correct for a non-ideal camera position. Ideally, a camera's imaging plane should be parallel to the document plane in order to minimize geometric distortions introduced due to the perspective projection of the scene, onto the imaging plane, through the camera's optics. If the camera is tilted with respect to the document, the text characters, in the document, that are farther away from the camera may be shortened relative to the text characters that are closer to the camera. The non-uniformity of the text shapes may lower the accuracy of Optical Character Recognition (OCR) algorithms, and other document processing algorithms, and may not be preferred by human readers. Scene clutter may add noise and structure into detected patterns within the camera-captured document image making it difficult to obtain an accurate pose estimate. Additionally, unknown camera parameters may contribute to the difficulty of recovering an accurate pose estimate. Thus, methods, systems and apparatus for reliable pose estimation of a camera-captured document image may be desirable. Further, correction, using the pose estimate, of the distortion in the camera-captured document image may be desirable. The process of correcting an image for a non-ideal camera positioning may be referred to as geometric rectification.
Some embodiments of the present invention comprise methods, systems and apparatus for correcting perspective distortion in a camera-captured document image.
According to a first aspect of the present invention, a bounding quadrilateral may be determined from horizontal vanishing information, vertical vanishing information and the results of corner detection. A plurality of geometric rectification quality measure values may be determined for a rectification hypothesis associated with the bounding quadrilateral. A rectification hypothesis for correcting the camera-captured document image may be selected from among the rectification hypothesis associated with the bounding quadrilateral and one, or more, additional rectification hypotheses.
According to a second aspect of the present invention, horizontal vanishing information may be estimated using horizontal line groups. Horizontal vanishing information may be a horizontal vanishing point or a horizontal vanishing direction.
According to a third aspect of the present invention, vertical vanishing information may be estimated using vertical line groups. Vertical vanishing information may be a vertical vanishing point or a vertical vanishing direction.
According to a fourth aspect of the present invention, corner detection may be used to bound a quadrilateral region-of-interest.
According to a fifth aspect of the present invention, selection of the rectification hypothesis for correcting the camera-captured document image may be performed hierarchically.
The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.
Embodiments of the present invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The figures listed above are expressly incorporated as part of this detailed description.
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the methods, systems and apparatus of the present invention is not intended to limit the scope of the invention, but it is merely representative of the presently preferred embodiments of the invention.
Elements of embodiments of the present invention may be embodied in hardware, firmware and/or a non-transitory computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system. While exemplary embodiments revealed herein may only describe one of these forms, it is to be understood that one skilled in the art would be able to effectuate these elements in any of these forms while resting within the scope of the present invention.
Although the charts and diagrams in the figures may show a specific order of execution, it is understood that the order of execution may differ from that which is depicted. For example, the order of execution of the blocks may be changed relative to the shown order. Also, as a further example, two or more blocks shown in succession in a figure may be executed concurrently, or with partial concurrence. It is understood by those with ordinary skill in the art that a non-transitory computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system, hardware and/or firmware may be created by one of ordinary skill in the art to carry out the various logical functions described herein.
Estimation of the planar pose of a camera-captured document image may be required to correct for a non-ideal camera position. Ideally, a camera's imaging plane should be parallel to the document plane in order to minimize geometric distortions introduced due to the perspective projection of the scene, onto the imaging plane, through the camera's optics. If the camera is tilted with respect to the document, the text characters, in the document, that are farther away from the camera may be shortened relative to the text characters that are closer to the camera. The non-uniformity of the text shapes may lower the accuracy of Optical Character Recognition (OCR) algorithms, and other document processing algorithms, and may not be preferred by human readers. Scene clutter may add noise and structure into detected patterns within the camera-captured document image making it difficult to obtain an accurate pose estimate. Additionally, unknown camera parameters may contribute to the difficulty of recovering an accurate pose estimate. Thus, methods, systems and apparatus for reliable pose estimation of a camera-captured document image may be desirable. Further, correction, using the pose estimate, of the distortion in the camera-captured document image may be desirable. The process of correcting an image for a non-ideal camera positioning may be referred to as geometric rectification.
Embodiments of the present invention comprise methods, systems and apparatus for rectifying a distorted, camera-captured document image.
Some embodiments of the present invention may be understood in relation to
An image may be received 102 in a processor module. The received image may comprise a plurality of pixels, wherein each pixel may comprise an associated value, or values. Vanishing points, in the received image, may be estimated 104 using the received image.
Vanishing-point estimation 104 may be understood in relation to
In some embodiments of the present invention, a contour edge-detection method, for example, the Canny edge detector and other contour edge detectors that may complete linear structures through weak contrast areas, may be used to perform edge detection 200 on the received image, resulting in an edge mask. In alternative embodiments of the present invention, a gradient-based edge-detection method, for example a Sobel edge detector, a Laplace edge detector, a Scharr edge detector and other gradient-based edge detectors, may be used to perform edge detection 200 on the received image.
From the results of the edge detection, significant linear structures may be extracted 202. Significant linear structures may result, for example, from document boundaries, interior region boundaries, graphic frames, image frames, text lines and paragraph formatting. In some embodiments of the present invention, extracting significant linear structures 202 may comprise applying, to the edge mask generated from performing edge detection, a Hough transform with line linking. In some embodiments, fragmented line segments with a separation less than a separation threshold may be linked. In some embodiments of the present invention, line segments with a line length less than a length threshold may be rejected. In some embodiments of the present invention, an occupancy measure, which may be denoted Lfragmentation, for a linear structure may be defined as the ratio of the sum total length of the line gaps to the length of the line and may be determined according to:
In some embodiments, the linear structure may be accepted as a significant linear structure when the occupancy measure, Lfragmentation, meets an occupancy criterion relative to an occupancy threshold, which may be denoted Tfragmentation, and the length, which may be denoted Llength, of the linear structure meets a length criterion relative to a length threshold, which may be denoted Tlength, for example, if:
Lfragmentation<Tfragmentation and Llength≧Tlength.
Exemplary threshold values are Tfragmentation=0.3 and Tlength=25. Alternatively, the number of gaps, which may be denoted Lnumgaps, in a linear structure may be defined as the number of line gaps in the linear structure, and a linear structure may be rejected as a significant linear structure when the number of line gaps meets a gap-count criterion relative to a gap-count threshold, which may be denoted Tnumgaps, for example, when Lnumgaps>Tnumgaps. In an exemplary embodiment, Tnumgaps=3.
In some embodiments of the present invention, line fragmentation may be measured for each significant linear structure, and linear structures with significant fragmentation may be rejected as unreliable estimators for the vanishing points. In an exemplary embodiment, line fragmentation may be measured by applying a distance transform to the edge mask. In some embodiments, the value of the distance-transformed edge mask at a pixel may be the distance, according to a distance metric, to the nearest edge pixel. The measure of the fragmentation of a detected line may be the sum of the intersection of the line with the distance-transformed edge mask. The value of a fragmentation measure will be larger for fragmented structures than for solid, connected lines.
Significant linear structures may be grouped 204 into a horizontal line set, also considered a horizontal line group, and a vertical line set, also considered a vertical line group. The horizontal line group may include all of the significant linear structures that converge to a horizontal vanishing point, if such a point exists. The vertical line group may include all of the significant linear structures that converge to a vertical vanishing point, if such a point exists.
The horizontal line group, which may be denoted LH, may be the set consisting of all significant linear structures, Li, from the set of significant linear structures, Lsig, such that |{right arrow over (ni)}·{right arrow over (v)}≦|{right arrow over (ni)}·{right arrow over (h)}|, where · denotes the vector dot product, |·| denotes the absolute value operator, {right arrow over (ni)} denotes the normal of Li and {right arrow over (v)} and {right arrow over (h)} denote the nominal horizontal,
and nominal vertical,
line normal directions, respectively. This may be denoted:
LH={LiεLsig:|{right arrow over (ni)}·{right arrow over (v)}|≦|{right arrow over (ni)}·{right arrow over (h)}|}.
The vertical line group, which may be denoted LV, may be the set consisting of all significant linear structures, Li, from the set of significant linear structures, Lsig, such that |{right arrow over (ni)}·{right arrow over (v)}|>|{right arrow over (ni)}·{right arrow over (h)}|. This may be denoted LV={LiεLsig:|{right arrow over (ni)}·{right arrow over (v)}|>|{right arrow over (ni)}·{right arrow over (h)}|}
The line groups, LH and LV, may be used to estimate 206 horizontal vanishing information, either a horizontal vanishing point or a horizontal vanishing direction, and to estimate 208 vertical vanishing information, either a vertical vanishing point or a vertical vanishing direction, respectively. In a document imaged under an oblique angle, parallel lines, under perspective projection, will converge to a vanishing point. Due to optical distortion and limitations in line estimation procedures, parallel lines may not always converge to a single point, and, in some cases, a degenerate condition may exist. In some embodiments of the present invention, the horizontal vanishing point, or horizontal vanishing direction, may be estimated first and then the vertical vanishing point, or vertical vanishing direction, may be estimated.
Vanishing-point estimation for a horizontal line group may be understood in relation to
For each line in a horizontal line group, LH, the implicit form, ax+by+c=0, of the line may be computed 300. A horizontal baseline measure may be calculated 302 for the horizontal line group. The horizontal baseline measure, which may be denoted baselineH, corresponding to the horizontal line group, LH, may be defined according to:
where di,j denotes the distance between the intersection of a first line, Li, in the horizontal line group, and a vertical line, V, passing through the center of the image, and the intersection of a second line, Lj, in the horizontal line group, and the vertical line, V, passing through the center of the image.
The number of lines in the horizontal line group, may be examined 304, and if there are 306 more than two lines in the horizontal line group, a coefficient matrix, which may be denoted L, may be formed 308. The coefficient matrix, L, may be formed by concatenating N vectors, Li=[ai bi ci], i=1, . . . , N, where N denotes the number of lines in the horizontal line group and ai, bi, ci denote the implicit-form coefficients of the ith line in the horizontal line group. A singular value decomposition (SVD) may be applied 310 to the coefficient matrix, L. A corresponding set of eigenvectors and eigenvalues, which may be denoted {{v1,λ1}, {v2,λ2}, {v3,λ3}}, where λi>λ2>λ3, may result from the singular value decomposition. An eigen-ratio, which may be denoted Λ, of the smallest eigenvalue to the largest eigenvalue may be calculated 312 according to:
The eigen-ratio, Λ, may be a measure of the convergence of the lines in the horizontal line group to a singular vanishing point. A horizontal line group with a large eigen-ratio, Λ, may correspond to a group of lines whose pair-wise intersections vary significantly, while a horizontal line group with a small eigen-ratio, Λ, may correspond to a group of lines wherein the lines generally intersect at substantially the same point.
The baseline measure for the horizontal line group may be compared 314 to a horizontal baseline threshold, which may be denoted THbaseline, and the eigen-ratio may be compared to a horizontal eigen-ratio threshold, which may be denoted THeigenratio, and when a condition associated with a wide horizontal baseline and good line convergence to a singular point is met 316, then a horizontal vanishing point may be calculated 318 using the eigenvector associated with the smallest eigenvalue. In some embodiments of the present invention, the condition associated with a wide horizontal baseline and good line convergence to a singular point may described by (baseline>THbaseline)&(Λ<THeigenratio), where & is a logical “AND”operator. In an exemplary embodiment,
where imgH denotes the horizontal image dimension, and THeigenratio=0.0005.
When the condition associated with a wide horizontal baseline and good line convergence is met 316, the horizontal vanishing point may be calculated 318 according to:
where v3 (1), v3 (2) and v3 (3) are the first, second and third components, respectively, of the eigenvector v3=[v3 (1) v3 (2) v3 (3)]. The calculated horizontal vanishing point may be returned 320 from the horizontal-vanishing-point estimator.
When the condition associated with a wide horizontal baseline and good line convergence is not met 322, the mean of the normal lines to the lines in the horizontal line group, LH, may be calculated 324. For each line, Li, in the horizontal line group, LH, a normal line, which may be denoted ni, may be determined. The mean of the N normal lines may be determined according to
The direction perpendicular to the mean of the normal lines may be determined 326 and returned 328, from the horizontal-vanishing-point estimator, as the horizontal vanishing direction.
When the number of lines in the horizontal line group is examined 304, if there are not more than two lines in the horizontal line group 330, then the number of lines in the horizontal line group may be examined 332 to determine if there are exactly two lines in the horizontal line group. If there are 334 exactly two lines in the horizontal line group, then the horizontal baseline measure for the group may be compared 336 to a horizontal baseline threshold, which may be denoted THbaseline, and a measure of how parallel the two lines are may be compared to parallel-measure threshold, which maybe denoted THparallel, and when a condition associated with a short horizontal baseline or very nearly parallel lines is met 344, then a horizontal vanishing direction may be calculated 346 by determining the direction perpendicular to the average of the two lines in the horizontal line group. In some embodiments of the present invention, the measure of how parallel the two lines in the horizontal line group are may be measure by determining the angle between the normals to the lines. In some embodiments of the present invention, the condition associated with a short horizontal baseline or very nearly parallel lines may described by (baseline<THbaseline)|(angle<THparallel), where | is a logical “OR”operator. In an exemplary embodiment,
where imgH denotes the horizontal image dimension, and THparallel=1°. The horizontal vanishing direction may be returned 348. If the two lines are not nearly parallel and have a wide horizontal baseline 338, then the horizontal vanishing point may be calculated 340 by determining the intersection of the two lines in the horizontal line group, and the horizontal vanishing point may be returned 342.
When the number of lines in the horizontal line group is examined 332, if there are not exactly two lines in the horizontal line group 350, then the number of lines in the horizontal line group may be examined 352 to determine if there is exactly one line in the horizontal line group. If there is 354 exactly one line in the horizontal line group, then the direction perpendicular to the line normal is determined 356 and returned 358 as the horizontal vanishing direction. If there is not 360 exactly one line in the horizontal line group, then the vanishing direction for the horizontal line group is returned 362 as the cardinal horizontal line [10].
Vanishing-point estimation for a vertical line group, may be understood in relation to
For each line in a vertical line group, LV, the implicit form, ax+by+c=0, of the line may be computed 400. A vertical baseline measure may be calculated 402 for the vertical line group. The vertical baseline measure, which may be denoted baselinev, corresponding to the vertical line group, LV, may be defined according to:
where di,j denotes the distance between the intersection of a first line, Li, in the vertical line group, and a horizontal line, H, passing through the center of the image, and the intersection of a second line, Lj, in the vertical line group, and the horizontal line, H, passing through the center of the image.
The number of lines in the vertical line group may be examined 404, and if there are 406 more than two lines in the vertical line group, a coefficient matrix, which may be denoted L, may be formed 408. The coefficient matrix, L, may be formed by concatenating N vectors, Li=[ai bi ci], i=1, . . . , N, where N denotes the number of lines in the vertical line group and ai, bi, ci denote the implicit-form coefficients of the ith line in the vertical line group. A singular value decomposition (SVD) may be applied 410 to the coefficient matrix, L. A corresponding set of eigenvectors and eigenvalues, which may be denoted {{v1,λ1,}, {v2,λ2}, {v3,λ3}}, where λ1>λ2>λ3, may result from the singular value decomposition. An eigen-ratio, which may be denoted Λ, of the smallest eigenvalue to the largest eigenvalue may be calculated 412 according to:
The eigen-ratio, Λ, may be a measure of the convergence of the lines in the vertical line group to a singular vanishing point. A vertical line group with a large eigen-ratio, Λ, may correspond to a group of lines whose pair-wise intersections vary significantly, while a vertical line group with a small eigen-ratio, Λ, may correspond to a group of lines wherein the lines generally intersect at substantially the same point.
The baseline measure for the vertical line group may be compared 414 to a vertical baseline threshold, which may be denoted TVbaseline, and the eigen-ratio may be compared to a vertical eigen-ratio threshold, which may be denoted TVeigenratio, and when a condition associated with a wide vertical baseline and good line convergence to a singular point is met 416, then a vertical vanishing point may be calculated 418 using the eigenvector associated with the smallest eigenvalue. In some embodiments of the present invention, the condition associated with a wide vertical baseline and good line convergence to a singular point may described by (baseline>TVbaseline)&(Λ<TVeigenratio), where & is a logical “AND” operator. In an exemplary embodiment,
where imgV denotes the vertical image dimension, and TVeigenratio=0.0005.
When the condition associated with a wide vertical baseline and good line convergence is met 416, the vertical vanishing point may be calculated 418 according to:
where v3 (1), v3 (2) and v3 (3) are the first, second and third components, respectively, of the eigenvector v3=[v3 (1) v3 (2) v3 (3)]. The calculated vertical vanishing point may be returned 420 from the vertical-vanishing-point estimator.
When the condition associated with a wide vertical baseline and good line convergence is not met 422, the mean of the normal lines to the lines in the vertical line group may be calculated 424. For each line, Li, in the vertical line group, a normal line, which may be denoted ni, may be determined. The mean of the N normal lines may be determined according to
The direction perpendicular to the mean of the normal lines may be determined 426 and returned 428, from the horizontal-vanishing-point estimator, as the vertical vanishing direction.
When the number of lines in the vertical line group is examined 404, if there are not more than two lines in the vertical line group 430, then the number of lines in the vertical line group may be examined 432 to determine if there are exactly two lines in the vertical line group. If there are 434 exactly two lines in the vertical line group, then the vertical baseline measure for the group may be compared 436 to a vertical baseline threshold, which may be denoted TVbaseline, and a measure of how parallel the two lines are may be compared to parallel-measure threshold, which maybe denoted TVparallel and when a condition associated with a short vertical baseline or very nearly parallel lines is met 444, then a vertical vanishing direction may be calculated 446 by determining the direction perpendicular to the average of the two lines in the vertical line group. In some embodiments of the present invention, the measure of how parallel the two lines in the vertical line group are may be measure by determining the angle between the normals to the lines. In some embodiments of the present invention, the condition associated with a short vertical baseline or very nearly parallel lines may described by (baseline<TVbaseline)|(angle<TVparrallel), where | is a logical “OR” operator. In an exemplary embodiment,
where imgV denotes the vertical image dimension, and TVparallel=1°. The vertical vanishing direction may be returned 448. If the two lines are not nearly parallel and have a wide horizontal baseline 438, then the vertical vanishing point may be calculated 440 by determining the intersection of the two lines in the vertical line group, and the vertical vanishing point may be returned 442.
When the number of lines in the vertical line group is examined 432, if there are not exactly two lines in the vertical line group 450, then the number of lines in the vertical line group may be examined 452 to determine if there is exactly one line in the vertical line group. If there is 454 exactly one line in the vertical line group, then the direction perpendicular to the line normal is determined 456 and returned 458 as the vertical vanishing direction. If there is not 460 exactly one line in the vertical line group, then the vanishing direction for the vertical line group is returned 462 to be orthogonal to the horizontal vanishing direction.
Referring again to
Results from the vanishing-point detection and the corner detection may be used by a quadrilateral-boundary detector to perform 108 quadrilateral-boundary detection. A document quadrilateral, also considered a perspective-distorted document rectangle, may be formed by the quadrilateral-boundary detector. In some embodiments of the present invention, the document quadrilateral may be a bounding quadrilateral, wherein the bounding-quadrilateral sides converge to the estimated vanishing points, if vanishing points were returned, or are parallel to the estimated vanishing directions, if vanishing directions were returned. The bounding quadrilateral may also encompass the corners detected by the corner detector. In some embodiments of the present invention, all detected corners must be encompassed by the bounding quadrilateral. In alternative embodiments, not all detected corners must be encompassed by the bounding quadrilateral.
Geometric rectification quality measures may be determined 110 from the estimated bounding quadrilateral and the vanishing points, or vanishing directions. The geometric rectification quality measures may measure the geometric rectification strength and the numerical stability of a pose estimate and a camera parameter estimate. The geometric rectification quality measures may be used to compare multiple estimates.
In some embodiments of the present invention, when the camera that captured the document image has an unknown effective focal length, measured in pixels, a full geometric rectification may only be performed when two vanishing points are detected. In some embodiments of the present invention, multiple focal length measures may be calculated to determine if the document bounding quadrilateral and corresponding vanishing points may be considered to have a reliable focal estimate.
In some embodiments of the present invention, a first focal length measure, which may be denoted #vp, may be associated with the quality of the focal length estimate. The first focal length measure may be determined by counting the number of vanishing points, excluding those whose boundary lines are nearly parallel. These embodiments may be understood in relation to
where nh1 518 and nh2 520 are the unit normals to the horizontal sides 510, 512, respectively, and · denotes the vector dot product. The angle, in degrees, between the vertical boundary pair may be determined according:
where nv1 522 and nv2 524 are the unit normals to the vertical sides 514, 516, respectively, and · denotes the vector dot product. The first focal length measure may be determined according to:
where th∥, denotes a threshold below which lines may be considered parallel. In an exemplary embodiment, th∥=1 degree.
In some embodiments of the present invention, which may be understood in relation to
vpCH=vpH−O,
where vpCH denotes the horizontal vanishing point defined relative to the image optical center 530, O, vpH denotes the horizontal vanishing center, and where the image optical center 530, O, and the horizontal vanishing center, vpH, are in image coordinates. In an exemplary embodiment, a vertical vanishing point defined relative to the image optical center 530 may be determined according to:
vpCV=vpV−O,
where vpCV denotes the horizontal vanishing point defined relative to the image optical center 530, O, vpV denotes the horizontal vanishing center, and where the image optical center 530, O, and the horizontal vanishing center, vpV, are in image coordinates. The second focal length measure may be determined according to:
farVPDist=max (∥vpCH∥,∥vpCV∥),
where ∥·∥ denotes a norm operator. In some embodiments of the present invention, the L2 norm may be used.
In some embodiments of the present invention a third focal length measure, which may be denoted vptest, may be calculated. The third focal length measure may be determined according:
vptest=vpCH·vpCV,
where · denotes the vector dot product and vpCH and vpCV denote the horizontal vanishing point with respect to the image optical center and the vertical vanishing point with respect to the image optical center, respectively, if they exist.
A focal length estimate may be defined according to:
A fourth focal length measure, which may be denoted FOV, may be defined if the estimated focal length, {circumflex over (f)}, is defined. The fourth focal length measure, FOV, may measure the field-of-view, in degrees, for the maximum image dimension, and may be calculated according to:
where imgW and imgH denote the image width and the image height, respectively.
In some embodiments of the present invention, the focal length estimate, {circumflex over (f)}, associated with a document quadrilateral and the corresponding vanishing points may be considered reliable when a focal-length-estimate reliability condition is met. In some embodiments of the present invention, the focal length estimate, {circumflex over (f)}, may be considered reliable when estF=(#vp==2) and (vptest<0) and (FOV>thFOV) and (farVPDist<thVPDist) is true, where thFOV and thVPDist are a field-of-view threshold value and a maximum-distance threshold value, respectively, and estF is a Boolean indicator of the reliability of the focal length estimate. In some embodiments of the present invention, thFOV=20° and thVPDist=5×104
In some embodiments of the present invention, described in relation to
where VV 532 and VH 534 denote the unit vectors from the optical center in the direction of the vertical and horizontal vanishing points, respectively. If a vertical vanishing point does not exist, then the vertical vanishing direction may be defined as:
If a horizontal vanishing point does not exist, then the horizontal vanishing direction may be defined as:
In some embodiments of the present invention, a measure of correction strength may be determined. In some exemplary embodiments, the correction strength measure maybe the angle between the optical axis and the estimated document planar normal predicted from the document quadrilateral. For a camera with a known focal length, the document normal may be estimated according to:
is a three-element vector formed from the estimated vanishing point, vpCH, and focal length, {circumflex over (f)}, when the vanishing point exists, or vanishing direction, VH, when the vanishing point does not exist, and
where
is a three-element vector formed from the estimated vanishing point, vpCV, and focal length, {circumflex over (f)}, when the vanishing point exists, or vanishing direction, VV, when the vanishing point does not exist.
In some embodiments of the present invention, when estF is “false,” then a default focal length may be determined according to:
where a default field-of-view is 50 degrees.
A correction angle may be determined according to:
where z=[0,0,1].
Referring to
Each rectification hypothesis and corresponding vanishing points may be processed to determine their associated geometric rectification quality measures, (correctionAnglei, vpAnglei, #vpi) where the subscript i may denote the associated hypothesis, which may be denoted Hi. The hypothesis associated with the normal pose may be denoted H′. In some embodiments of the present invention, an overlap measure may be calculated wherein an intersection between the document area defined by the document quadrilateral, which may be denoted Qi, and the detected corners, which may be denoted C, may be performed to determined the percentage of corners within the document area. The overlap measure for a hypothesis, which may be denoted overlapi, may be determined according to:
where ∩ denotes an intersection operator.
In some embodiments of the present invention, no alternative bounding regions may be received. In these alternative embodiments, only the hypothesis associated with the normal pose and a hypothesis associated with the document quadrilateral and corresponding vanishing points, or vanishing directions, determined according to the line-group methods described herein, may be filtered.
In some embodiments of the present invention, two hypotheses may be examined in addition to the hypothesis, H′, associated with the normal pose. A first hypothesis, which may be denoted H1, associated with the document quadrilateral and corresponding vanishing points, or vanishing directions, determined according to the line-group methods described herein. A second hypothesis, which may be denoted H2 may be associated with a document region determined by a region-growing method. In some embodiments of the present invention, the alternative, second, hypothesis, H2, may be formed using a region-growing technique, for example, the technique described in U.S. patent application Ser. No. 13/034,594, filed on Feb. 24, 2011, entitled “Methods and Systems for Determining a Document Region-of-Interest in an Image,” and invented by Ahmet Mufit Ferman and Lawrence Shao-hsien Chen, said application, U.S. patent application Ser. No. 13/034,594, is hereby incorporated by reference herein in its entirety.
In some embodiments of the present invention, filtering, also considered verification, of a geometric rectification hypothesis may depend on the method used to derive the hypothesis. When a hypothesis is derived using line groups according to embodiments of the present invention described herein, the hypothesis, H1, may be rejected when
correctionAngle1>θmax,
where θmax may denote the maximum correction angled deemed reliable and wherein, in an exemplary embodiment, θmax=40°. The hypothesis, H1, may be rejected if the number of vanishing points is two and if the vanishing point angle associated with the hypothesis does not fall within a range of vanishing-point angles. In an exemplary embodiment, this condition may be:
#vp1=2 and (vpAngle1<80° or vpAngle1>110°).
In some embodiments of the present invention comprising an alternative hypothesis comprising a region-growing technique, a hypothesis may be rejected if the correction angle is greater than an angle threshold and if the overlap is less than an overlap threshold. In an exemplary embodiment, this condition may be:
correctionAngle2>θmax and overlap2<thC,
where θmax may denote the maximum correction angled deemed reliable and thC may denote the percentage of corners that must be within the document region for the hypothesis to be verified. In an exemplary embodiment, θmax=40° and thC=98%. The hypothesis may also be rejected when the shape of the region returned by the region-growing technique does not sufficiently match a quadrilateral shape.
In some embodiments of the present invention, rectification hypothesis filtering 116 may comprise a hierarchical decision. If the region-growing-based hypothesis, H2 is verified, then the region-growing-based hypothesis is selected for geometric correction. If the region-growing-based hypothesis, H2, is not verified and the line-groups hypothesis, H1, is verified, then the line-groups hypothesis, H1, is selected for geometric correction. Otherwise, the normal hypothesis, H′, is selected, and geometric rectification for the image may be disabled.
For full geometric rectification, the relative lengths of the documents sides are known. Full geometric rectification may not always be possible. When full geometric rectification is not possible, an approximation may be used to determine an estimate for the approximate camera perspective geometry and the document normal.
When the camera focal length is not known a priori or when the estimate of the focal length is not reliable, for example, when estF is “false,” the default focal length may be used. The rectified image size may be determined 118, as illustrated in relation to
where np is the unit normal to the document plane 610 and · denotes the vector dot product. The points [p1, p2, p3, p4] define the document rectangle in the document plane 610 and a distance D units from the camera projection original 602. In some embodiments of the present invention, D=5{circumflex over (f)}, where {circumflex over (f)} is the focal length estimate.
The document aspect ratio, which may be denoted Â, may be estimated according to
where Ĥ and Ŵ are the estimated document height and width, respectively, in the document plane 610, and Ĥ may be determined according to Ĥ=∥p4−p1∥ and Ŵ may be determined according to Ŵ=∥p2−p1∥.
The scale of the rectified image may be determined from the quadrilateral points, [q1, q2, q3, q4], and the document aspect ratio, Â. The rectified image size, which may be denoted (ŵ, ĥ), may be determined according to:
where sh=min(max(dh1, dh2), imgW) is the maximum horizontal line length clipped to the input image width, imgW, sv=min(max(dv1, dv2), imgH) is the maximum vertical line length clipped to the input image height, imgH, and dh1=∥q2−q1∥ is the length of the line between the upper quadrilateral points, dh2=∥q3−q4∥ is the length of the line between the lower quadrilateral points, dv1=∥q4−q1∥ is the length of the line between the left-most quadrilateral points and dv2=∥q3−q2∥ the length of the line between the right-most quadrilateral points.
Referring to
Although the charts and diagrams in the figures may show a specific order of execution, it is understood that the order of execution may differ from that which is depicted. For example, the order of execution of the blocks may be changed relative to the shown order. Also, as a further example, two or more blocks shown in succession in a figure may be executed concurrently, or with partial concurrence. It is understood by those with ordinary skill in the art that software, hardware and/or firmware may be created by one of ordinary skill in the art to carry out the various logical functions described herein.
Some embodiments of the present invention may comprise a computer program product comprising a computer-readable storage medium having instructions stored thereon/in which may be used to program a computing system to perform any of the features and methods described herein. Exemplary computer-readable storage media may include, but are not limited to, flash memory devices, disk storage media, for example, floppy disks, optical disks, magneto-optical disks, Digital Versatile Discs (DVDs), Compact Discs (CDs), micro-drives and other disk storage media, Read-Only Memory (ROMs), Programmable Read-Only Memory (PROMs), Erasable Programmable Read-Only Memory (EPROMS), Electrically Erasable Programmable Read-Only Memory (EEPROMs), Random-Access Memory (RAMS), Video Random-Access Memory (VRAMs), Dynamic Random-Access Memory (DRAMs) and any type of media or device suitable for storing instructions and/or data.
The terms and expressions which have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding equivalence of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.
Number | Name | Date | Kind |
---|---|---|---|
5528290 | Saund | Jun 1996 | A |
5581637 | Cass et al. | Dec 1996 | A |
6304313 | Honma | Oct 2001 | B1 |
6493469 | Taylor et al. | Dec 2002 | B1 |
6721465 | Nakashima et al. | Apr 2004 | B1 |
7079265 | Horie | Jul 2006 | B2 |
7224392 | Cahill et al. | May 2007 | B2 |
7593595 | Heaney, Jr. et al. | Sep 2009 | B2 |
20020075389 | Seeger et al. | Jun 2002 | A1 |
20020149808 | Pilu | Oct 2002 | A1 |
20070024714 | Kim et al. | Feb 2007 | A1 |
20090185738 | Nepomniachtchi | Jul 2009 | A1 |
20110069180 | Nijemcevic et al. | Mar 2011 | A1 |
Number | Date | Country |
---|---|---|
10-210354 | Aug 1998 | JP |
10-210355 | Aug 1998 | JP |
2000013612 | Jan 2000 | JP |
2000-200344 | Jul 2000 | JP |
2007058634 | Mar 2007 | JP |
2010171976 | Aug 2010 | JP |
Entry |
---|
Clark et al., Location and Recovery of Text on Oriented Surfaces [on-line], Jan. 22, 2000 [retrieved Nov. 12, 2014], Proc. SPIE 3967, Document Recognition and Retrieval VII, vol. 3967, pp. 267-277. Retrieved from the Internet: http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=921212. |
Clark et al., Recognizing text in real scenes [on-line], Jul. 2002 [retrieved Aug. 10, 2015], International Journal on Document Analysis and Recognition, vol. 4, Iss 4, pp. 243-257. Retrieved from the Internet: http://link.springer.com/article/10.1007/s10032-001-0072-2. |
McLean, Geometric Correction of Digitized Art, Mar. 1996 [retrieved Mar. 2, 2016], Graphical Models and Image Processing, vol. 58, Issue 2, pp. 142-154. Retrieved from the Internet: http://www.sciencedirect.com/science/article/pii/S107731699690012X. |
Japanese Office Action, Patent Application No. 2012-219840, Mailing Date: Sep. 10, 2013. |
Number | Date | Country | |
---|---|---|---|
20130094764 A1 | Apr 2013 | US |