Computing devices, such as smartphones, often include cameras to capture various types of images. Images may be of different kinds of subjects, such as people, places, items of interest, and the like. Further, it has become increasingly common for such cameras to be used to capture images of documents.
Images of documents may be captured with a camera, such as a smartphone camera. A user may intend to “scan” a document by taking a digital photograph of the document. However, the resulting photograph is often warped due to the angle of the camera, the optics of the camera, or imprecise placement of the document. In addition, when photographing a page of a book, due to the binding of the book, the page may tend to curve and may be difficult for the user to manually flatten.
A digital photograph of a document page may be dewarped by a process that models document boundaries as curves, such as polynomial curves. Image analysis may be performed on a digital photograph of a document page or other item having straight boundaries to obtain curves that define the appearance of the boundaries in the photograph. For example, a rectangular document page may be represented by four boundary curves. A transformation may then be computed using the curves. The transformation may be used to transform pixel coordinates in the image to pixel coordinates in a dewarped image that approximates a scan obtained if a flatbed scanner or similar device were to be used. Further, an off-the-shelf mobile computing device, such as a smartphone, may be used, rather than using specialized scanning equipment. A processing-intensive analysis of text direction or text flow to model, for example, a warped document page as a mesh or grid is not required. Document content may be ignored. As such, both text and image heavy documents may be accurately scanned by a user with his/her smartphone or similar device.
The processor 102 may include a central processing unit (CPU), a microcontroller, a microprocessor, a processing core, a field-programmable gate array (FPGA), or a similar device capable of executing instructions. The processor 102 may be connected to the non-transitory machine-readable medium 104. The processor 102 and medium 104 may cooperate to execute instructions.
The non-transitory machine-readable medium 104 may include an electronic, magnetic, optical, or other physical storage device that encodes executable instructions. The medium 104 may include, for example, random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), flash memory, a storage drive, an optical disc, or similar.
Image dewarp instructions 106 may be stored in the non-transitory machine-readable medium 104 to be executed by the processor 102.
The dewarp instructions 106 detect boundaries of a representation of a document page in a captured image 108. Further, the dewarp instructions 106 model the boundaries of the representation of the document page as nonlinear curves 110. Then, the image dewarp instructions 106 use the nonlinear curves 110 to transform pixels of the representation of the document page into pixels of a dewarped representation of the document page. Finally, the instructions 106 output a dewarped image 112 based on the dewarped representation of the document page.
The captured image 108, nonlinear curves 110, and dewarped image 112 may be stored in the medium 104 for purposes of the execution of the image dewarp instructions 106.
The captured image 108 may be captured by the device 100, such as by a digital camera at the device 100, or may be captured by another device. The captured image 108 includes a representation of a document page. This type of image may be captured by taking a digital photograph of a book page, a loose piece of paper, or similar document page. When the device 100 is held by the user to take such a photograph, the representation of the document page in the captured image 108 may not align with the borders of the image and may include warped or curved boundaries for the page, as shown in
Modelling the boundaries of the representation of the document page as nonlinear curves 110 may include performing edge detection (e.g.,
Edge detection, as shown in
As shown in
With reference to
Line segments may be associated into groups with reference to endpoint proximity and angle between pairs of line segments. With reference to
Iteration may be performed to compare pairs of line segments. For each pair of line segments, the pair may be determined to belong to the same group if a set of conditions is met. An example set of conditions is as follows:
The predefined distance and predefined angular range may be set, independently, to suitable values. The predefined distance may set to promote connections, so that line segments are readily grouped. The predefined angular range may be set based on expected curvature of document boundaries in captures images. For example, document pages laid flat may not require as large a predefined angular range as pages of an open book. In implementations useful for book page scanning, the predefined angular range may be set larger than in implementations that mainly consider loose-leaf pages.
After pairs of line segments have been considered for grouping, a collection of line segment groups is obtained. Each line segment group defines a polyline that potentially represents an outer boundary of the document page. It should be understood that the lines segments of a group need not have ends connected (i.e., located at the same coordinates) to be considered members of the group. It is sufficient for endpoints to be proximate and for mutual angle, γ (gamma), to be within an acceptable range.
Once line segments have been grouped, the line segment groups may be considered as candidates for boundaries of the document page. Discriminating the boundary-representative line segment groups from other line segment groups detected in the captured image, an example of which is shown in
For example, to determine the four polylines that represent the four boundaries of a document, the line segment groups may be divided into four categories, where each category is associated with one boundary of the document. To achieve this, top and bottom document areas may be defined by a horizontal axis. Left and right document areas may be defined by a vertical axis. The horizontal and vertical axes may be selected to bisect the captured image based on the premise that the document is the central subject of the image. A line segment group that is located above the horizontal axis may be taken as a candidate for the document's upper boundary. A line segment group that is located below the horizontal axis may be taken as a candidate for the document's bottom boundary. A line segment group that is located to the left of the vertical axis may be taken as a candidate for the document's left boundary. A line segment group that is located to the right of the horizontal axis may be taken as a candidate for the document's right boundary. As such, each line segment group may be categorized as upper, lower, left, or right.
Then, one line segment group is selected from each category to represent each of the four boundaries of the document. The selected line segment groups are those that are longest and have the nearest endpoints.
As shown in
Each line segment group has two endpoints (e.g., 540, 542) which are the opposite endpoints of the furthest separated constituent line segments. Endpoint proximity may be computed for pairs of line segment groups in adjacent categories (upper 500, lower 502, left 504, right 506). For example, line segment groups in the upper category may have endpoints distances computed with respect to line segment groups in the left category and line segment groups in the right category. Similarly, line segment groups in the lower category may have endpoints distances computed with respect to line segment groups in the left category and line segment groups in the right category. Line segment groups in the left category may have endpoints distances computed with respect to line segment groups in the upper category and line segment groups in the lower category. Similarly, line segment groups in the right category may have endpoints distances computed with respect to line segment groups in the lower category and line segment groups in the upper category.
Each line segment group may be assigned a score based on its endpoint proximity to line segment groups in neighboring categories and its length. The line segment group with the highest score in each category (upper 500, lower 502, left 504, right 506) may be selected as the representative of the respective page boundary. Weightings may be applied to arrive at a score, with the intent of identifying line segment groups that are longest and have the closest endpoints.
In some examples, all combinations of upper 500, lower 502, left 504, and right 506 line segment groups are enumerated and a total distance between endpoints of the line segment groups is computed for each combination. The combination with the smallest total distance is selected as the best representative of the page boundaries. Total distance may be considered a type of score, in which smaller total distances are considered to be higher scores.
In the example of
The discriminated boundaries of the document, as represented by selected line segment groups 520, 522, 524, 524, are then fitted to nonlinear curves. Each selected line segment group 520, 522, 524, 524 may be approximated by a polynomial equation. A least squares method may be used. With reference to
The polynomial curves 530, 532, 534, 536 are nonlinear curves 110 (
With reference back to
As shown in
For each pixel 600 of the dewarped image 112, the pixel's position within the rectangular boundaries of the dewarped image 112 may be used in an interpolation to identify a source pixel 602 within area of the captured image 108 bounded by the nonlinear curves 110. Linear interpolation may be used. For example, a dewarped pixel 600 may be determined to be a certain distance along the length Lt in the dewarped image 112. Such distance may be normalized to the length Lt, for example, represented as 0 to 1, where 0 is at one end of the length Lt and 1 is at the opposite end of the length Lt. Then the true position of the corresponding source pixel 602, along the length Lt of the curved boundary in the captured image 108, may be computed using the same normalized distance. The same applies to lengths Lb, Ll, and Lr. The influence of a pixel's normalized distance along a length Lt, Lb, Ll, Lr may be weighted based on the distance of that pixel from that length Lt, Lb, Ll, Lr. For example, pixels 600 near the upper length Lt in the dewarped image 112 may have identification of their source pixels 602 in the captured image 108 influenced more by the upper boundary curve that the the lower boundary curve.
As such, source pixel information may be geometrically transformed into dewarped pixel information, as shown in
At block 702, boundaries of a representation of a document page are detected in a captured image. This may include identifying line segments, such as by using edge detection and line segment detection, and then connecting or associating line segments that have nearby endpoints and similar angles. Candidate document page boundaries may be initially represented as groups of line segments that are apparently connected.
At block 704, the boundaries of the representation of the document page are modelled as nonlinear curves. For example, the polyline candidate document page boundaries may be compared for endpoint proximity and length, with polylines that are longer and that have closer endpoints being favored. Then, with reference to the principle that the captured image contains the document page as the main subject in an orientation that puts its boundaries generally aligned with the image boundaries, a suitable group of line segments is selected to map to each of the four document page boundaries. Then, each group of selected line segments is fitted to a nonlinear curve, such as a polynomial curve. One polynomial curve may be obtained for each of the four linear outside boundaries of a rectangular document page.
At block 706, the four nonlinear curves are used to transform pixels of the captured image into pixels of a dewarped image. Interpolation may be used to map each pixel in the dewarped image to a source pixel in the captured image. The dewarped image may thus contain a dewarped representation of the document page that was the main subject of the captured image.
Then, at block 708, the dewarped image is outputted. For example, the dewarped image may be saved to a non-transitory computer-readable medium, may be transmitted over a computer network, may be displayed to a user via a display device, or similar.
At block 802, an image of a generally rectangular document page is captured. For example, a user may place a document on a surface and use a handheld computing device to capture an image of the document.
At block 804, line segment detection is performed on the captured image. Line segment detection may include edge detection.
At block 806, line segments are connected based on endpoint proximity and relative angle. The techniques discussed with reference to
At block 808, each set of connected line segments may be considered a candidate of best representative of a particular document boundary (e.g., upper, lower, left, right). Each set of connected line segments has two endpoints that are the opposite endpoints of the first and last constituent line segments. Sets of connected line segments may be tested for endpoint proximity and length to identify four sets of connected line segments that best represent the four boundaries of the document page.
At block 810, the selected connected line segments are each modelled as a nonlinear curve. Polynomial curve fitting may be used. The nonlinear curves describe the boundaries of the document page, as apparent in the captured image. It should be noted that the process 800 does not refer to the content of the captured image to obtain the nonlinear curves.
Then, at block 812, the nonlinear curves are used in an interpolation or transformation to convert pixels in the captured image to pixels in a dewarped image, which may then be outputted. The dewarped image thus compensates for warping or other curvature in the document content, which may be caused by the conditions of taking the captured image, optical effects of the camera taking the captured image, and similar defects.
The device 900 may further include a camera 902 connected to the processor 102. The camera 902 may be used to capture images in the vicinity of the device 900, such as images of documents.
The device 900 may further include a display 904, such as a touchscreen display. The display 904 may be used to display information to the user of the device 900, such as captured images of documents and dewarped version of such captured images.
The device 900 may further include a communications interface 906 to communicate data with a computer network. The communications interface 906 may include a wireless interface, such as an Institute of Electrical and Electronics Engineers (IEEE) 802.11 interface, a mobile/cellular network interface, or similar. The device 900 may thus share information, such as images, with other devices and with computer servers.
The device 900 may include dewarp instructions 106 to model boundaries of a document page in a captured image 108 as nonlinear curves 110. The dewarp instruction may further use the nonlinear curves 110 to generate a dewarped image 112 of the document page. Dewarped images 112 may be stored at the non-transitory computer-readable medium 104, displayed at the display 904, or communicated via the communications interface 906.
During the processing of a captured image 108 to obtain a dewarped image 112, the medium 104 may be used to store relevant information, such as an edge-detected image 910 (e.g.,
In view of the above, it should be apparent that an image of a document may be dewarped without analysis or knowledge of document content, such as text. An efficient tradeoff between speed and accuracy may be obtained, particularly by recognizing that captured images of documents tend to have certain characteristics, as discussed above. Image dewarping of documents may be fast enough to meet expectations of handheld device users and accurate enough to sufficiently approximate a flatbed scanner or other specialized equipment.
It should be recognized that features and aspects of the various examples provided above can be combined into further examples that also fall within the scope of the present disclosure. In addition, the figures are not to scale and may have size and shape exaggerated for illustrative purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/046541 | 8/14/2019 | WO | 00 |